summaryrefslogtreecommitdiff
path: root/init
AgeCommit message (Collapse)Author
2025-03-24Merge tag 'hardening-v6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "As usual, it's scattered changes all over. Patches touching things outside of our traditional areas in the tree have been Acked by maintainers or were trivial changes: - loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan Vadivel) - samples/check-exec: Fix script name (Mickaël Salaün) - yama: remove needless locking in yama_task_prctl() (Oleg Nesterov) - lib/string_choices: Sort by function name (R Sundar) - hardening: Allow default HARDENED_USERCOPY to be set at compile time (Mel Gorman) - uaccess: Split out compile-time checks into ucopysize.h - kbuild: clang: Support building UM with SUBARCH=i386 - x86: Enable i386 FORTIFY_SOURCE on Clang 16+ - ubsan/overflow: Rework integer overflow sanitizer option - Add missing __nonstring annotations for callers of memtostr*()/strtomem*() - Add __must_be_noncstr() and have memtostr*()/strtomem*() check for it - Introduce __nonstring_array for silencing future GCC 15 warnings" * tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits) compiler_types: Introduce __nonstring_array hardening: Enable i386 FORTIFY_SOURCE on Clang 16+ x86/build: Remove -ffreestanding on i386 with GCC ubsan/overflow: Enable ignorelist parsing and add type filter ubsan/overflow: Enable pattern exclusions ubsan/overflow: Rework integer overflow sanitizer option to turn on everything samples/check-exec: Fix script name yama: don't abuse rcu_read_lock/get_task_struct in yama_task_prctl() kbuild: clang: Support building UM with SUBARCH=i386 loadpin: remove MODULE_COMPRESS_NONE as it is no longer supported lib/string_choices: Rearrange functions in sorted order string.h: Validate memtostr*()/strtomem*() arguments more carefully compiler.h: Introduce __must_be_noncstr() nilfs2: Mark on-disk strings as nonstring uapi: stddef.h: Introduce __kernel_nonstring x86/tdx: Mark message.bytes as nonstring string: kunit: Mark nonstring test strings as __nonstring scsi: qla2xxx: Mark device strings as nonstring scsi: mpt3sas: Mark device strings as nonstring scsi: mpi3mr: Mark device strings as nonstring ...
2025-03-24Merge tag 'vfs-6.15-rc1.initramfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs initramfs updates from Christian Brauner: "This adds basic kunit test coverage for initramfs unpacking and cleans up some buffer handling issues and inefficiencies" * tag 'vfs-6.15-rc1.initramfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: append initramfs files to the VFS section initramfs: avoid static buffer for error message initramfs: fix hardlink hash leak without TRAILER initramfs: reuse name_len for dir mtime tracking initramfs: allocate heap buffers together initramfs: avoid memcpy for hex header fields vsprintf: add simple_strntoul initramfs_test: kunit tests for initramfs unpacking init: add initramfs_internal.h
2025-03-19Merge tag 'v6.14-rc7' into x86/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-12compiler_types: Introduce __nonstring_arrayKees Cook
GCC has expanded support of the "nonstring" attribute so that it can be applied to arrays of character arrays[1], which is needed to identify correct static initialization of those kinds of objects. Since this was not supported prior to GCC 15, we need to distinguish the usage of Linux's existing __nonstring macro for the attribute for non-multi-dimensional char arrays. Until GCC 15 is the minimum version, use __nonstring_array to mark arrays of non-string character arrays. (Regular non-string character arrays can continue to use __nonstring.) Once GCC 15 is the minimum compiler version we can replace all uses of __nonstring_array with just __nonstring and remove this macro. This allows for changes like this: -static const char table_sigs[][ACPI_NAMESEG_SIZE] __initconst = { +static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst = { ACPI_SIG_BERT, ACPI_SIG_BGRT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, Which will silence the coming -Wunterminated-string-initialization warnings in GCC 15: In file included from ../include/acpi/actbl.h:371, from ../include/acpi/acpi.h:26, from ../include/linux/acpi.h:26, from ../drivers/acpi/tables.c:19: ../include/acpi/actbl1.h:30:33: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (5 chars into 4 available) [-Wunterminated-string-initialization] 30 | #define ACPI_SIG_BERT "BERT" /* Boot Error Record Table */ | ^~~~~~ ../drivers/acpi/tables.c:400:9: note: in expansion of macro 'ACPI_SIG_BERT' 400 | ACPI_SIG_BERT, ACPI_SIG_BGRT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, | ^~~~~~~~~~~~~ ../include/acpi/actbl1.h:31:33: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (5 chars into 4 available) [-Wunterminated-string-initialization] 31 | #define ACPI_SIG_BGRT "BGRT" /* Boot Graphics Resource Table */ | ^~~~~~ Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178 [1] Acked-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/r/20250310214244.work.194-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-11rust: Disallow BTF generation with Rust + LTOMatthew Maurer
The kernel cannot currently self-parse BTF containing Rust debug information. pahole uses the language of the CU to determine whether to filter out debug information when generating the BTF. When LTO is enabled, Rust code can cross CU boundaries, resulting in Rust debug information in CUs labeled as C. This results in a system which cannot parse its own BTF. Signed-off-by: Matthew Maurer <mmaurer@google.com> Cc: stable@vger.kernel.org Fixes: c1177979af9c ("btf, scripts: Exclude Rust CUs with pahole") Link: https://lore.kernel.org/r/20250108-rust-btf-lto-incompat-v1-1-60243ff6d820@google.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-03-08initramfs: avoid static buffer for error messageDavid Disseldorp
The dynamic error message printed if CONFIG_RD_$ALG compression support is missing needn't be propagated up to the caller via a static buffer. Print it immediately via pr_err() and set @message to a const string to flag error. Before: text data bss dec hex filename 8006 1118 8 9132 23ac init/initramfs.o After: text data bss dec hex filename 7938 1022 8 8968 2308 init/initramfs.o Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20250304061020.9815-9-ddiss@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-08initramfs: fix hardlink hash leak without TRAILERDavid Disseldorp
Covered in Documentation/driver-api/early-userspace/buffer-format.rst , initramfs archives can carry an optional "TRAILER!!!" entry which serves as a boundary for collecting and associating hardlinks with matching inode and major / minor device numbers. Although optional, if hardlinks are found in an archive without a subsequent "TRAILER!!!" entry then the hardlink state hash table is leaked, e.g. unfixed kernel, with initramfs_test.c hunk applied only: unreferenced object 0xffff9405408cc000 (size 8192): comm "kunit_try_catch", pid 53, jiffies 4294892519 hex dump (first 32 bytes): 01 00 00 00 01 00 00 00 00 00 00 00 ff 81 00 00 ................ 00 00 00 00 00 00 00 00 69 6e 69 74 72 61 6d 66 ........initramf backtrace (crc a9fb0ee0): [<0000000066739faa>] __kmalloc_cache_noprof+0x11d/0x250 [<00000000fc755219>] maybe_link.part.5+0xbc/0x120 [<000000000526a128>] do_name+0xce/0x2f0 [<00000000145c1048>] write_buffer+0x22/0x40 [<000000003f0b4f32>] unpack_to_rootfs+0xf9/0x2a0 [<00000000d6f7e5af>] initramfs_test_hardlink+0xe3/0x3f0 [<0000000014fde8d6>] kunit_try_run_case+0x5f/0x130 [<00000000dc9dafc5>] kunit_generic_run_threadfn_adapter+0x18/0x30 [<000000001076c239>] kthread+0xc8/0x100 [<00000000d939f1c1>] ret_from_fork+0x2b/0x40 [<00000000f848ad1a>] ret_from_fork_asm+0x1a/0x30 Fix this by calling free_hash() after initramfs buffer processing in unpack_to_rootfs(). An extra hardlink_seen global is added as an optimization to avoid walking the 32 entry hash array unnecessarily. The expectation is that a "TRAILER!!!" entry will normally be present, and initramfs hardlinks are uncommon. There is one user facing side-effect of this fix: hardlinks can currently be associated across built-in and external initramfs archives, *if* the built-in initramfs archive lacks a "TRAILER!!!" terminator. I'd consider this cross-archive association broken, but perhaps it's used. Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20250304061020.9815-8-ddiss@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-08initramfs: reuse name_len for dir mtime trackingDavid Disseldorp
We already have a nulterm-inclusive, checked name_len for the directory name, so use that instead of calling strlen(). Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20250304061020.9815-7-ddiss@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-08initramfs: allocate heap buffers togetherDavid Disseldorp
header_buf, symlink_buf and name_buf all share the same lifecycle so needn't be allocated / freed separately. This change leads to a minor reduction in .text size: before: text data bss dec hex filename 7914 1110 8 9032 2348 init/initramfs.o after: text data bss dec hex filename 7854 1110 8 8972 230c init/initramfs.o A previous iteration of this patch reused a single buffer instead of three, given that buffer use is state-sequential (GotHeader, GotName, GotSymlink). However, the slight decrease in heap use during early boot isn't really worth the extra review complexity. Link: https://lore.kernel.org/all/20241107002044.16477-7-ddiss@suse.de/ Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20250304061020.9815-6-ddiss@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-08initramfs: avoid memcpy for hex header fieldsDavid Disseldorp
newc/crc cpio headers contain a bunch of 8-character hexadecimal fields which we convert via simple_strtoul(), following memcpy() into a zero-terminated stack buffer. The new simple_strntoul() helper allows us to pass in max_chars=8 to avoid zero-termination and memcpy(). Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20250304061020.9815-5-ddiss@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-08initramfs_test: kunit tests for initramfs unpackingDavid Disseldorp
Provide some basic initramfs unpack sanity tests covering: - simple file / dir extraction - filename field overrun, as reported and fixed separately via https://lore.kernel.org/r/20241030035509.20194-2-ddiss@suse.de - "070702" cpio data checksums - hardlinks Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20250304061020.9815-3-ddiss@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04rcu-tasks: Move RCU Tasks self-tests to core_initcall()Paul E. McKenney
The timer and hrtimer softirq processing has moved to dedicated threads for kernels built with CONFIG_IRQ_FORCED_THREADING=y. This results in timers not expiring until later in early boot, which in turn causes the RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y, which further causes the entire kernel to hang. One fix would be to make timers work during this time, but there are no known users of RCU Tasks grace periods during that time, so no justification for the added complexity. Not yet, anyway. This commit therefore moves the call to rcu_init_tasks_generic() from kernel_init_freeable() to a core_initcall(). This works because the timer and hrtimer kthreads are created at early_initcall() time. Fixes: 49a17639508c3 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: <linux-trace-kernel@vger.kernel.org> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04init: add initramfs_internal.hDavid Disseldorp
The new header only exports a single unpack function and a CPIO_HDRLEN constant for future test use. Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20250304061020.9815-2-ddiss@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-18kallsyms: Remove KALLSYMS_ABSOLUTE_PERCPUBrian Gerst
x86-64 was the only user. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-16-brgerst@gmail.com
2025-02-18x86/percpu/64: Use relative percpu offsetsBrian Gerst
The percpu section is currently linked at absolute address 0, because older compilers hard-coded the stack protector canary value at a fixed offset from the start of the GS segment. Now that the canary is a normal percpu variable, the percpu section does not need to be linked at a specific address. x86-64 will now calculate the percpu offsets as the delta between the initial percpu address and the dynamically allocated memory, like other architectures. Note that GSBASE is limited to the canonical address width (48 or 57 bits, sign-extended). As long as the kernel text, modules, and the dynamically allocated percpu memory are all in the negative address space, the delta will not overflow this limit. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-9-brgerst@gmail.com
2025-01-31Merge tag 'kbuild-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Support multiple hook locations for maint scripts of Debian package - Remove 'cpio' from the build tool requirement - Introduce gendwarfksyms tool, which computes CRCs for export symbols based on the DWARF information - Support CONFIG_MODVERSIONS for Rust - Resolve all conflicts in the genksyms parser - Fix several syntax errors in genksyms * tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (64 commits) kbuild: fix Clang LTO with CONFIG_OBJTOOL=n kbuild: Strip runtime const RELA sections correctly kconfig: fix memory leak in sym_warn_unmet_dep() kconfig: fix file name in warnings when loading KCONFIG_DEFCONFIG_LIST genksyms: fix syntax error for attribute before init-declarator genksyms: fix syntax error for builtin (u)int*x*_t types genksyms: fix syntax error for attribute after 'union' genksyms: fix syntax error for attribute after 'struct' genksyms: fix syntax error for attribute after abstact_declarator genksyms: fix syntax error for attribute before nested_declarator genksyms: fix syntax error for attribute before abstract_declarator genksyms: decouple ATTRIBUTE_PHRASE from type-qualifier genksyms: record attributes consistently for init-declarator genksyms: restrict direct-declarator to take one parameter-type-list genksyms: restrict direct-abstract-declarator to take one parameter-type-list genksyms: remove Makefile hack genksyms: fix last 3 shift/reduce conflicts genksyms: fix 6 shift/reduce conflicts and 5 reduce/reduce conflicts genksyms: reduce type_qualifier directly to decl_specifier genksyms: rename cvar_qualifier to type_qualifier ...
2025-01-28treewide: const qualify ctl_tables where applicableJoel Granados
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-01-27Merge tag 'drm-next-2025-01-27' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Simona Vetter: "cgroup: - fix Koncfig fallout from new dmem controller Driver Changes: - v3d NULL pointer regression fix in fence signalling race - virtio: uaf in dma_buf free path - xlnx: fix kerneldoc - bochs: fix double-free on driver removal - zynqmp: add missing locking to DP bridge driver - amdgpu fixes all over: - documentation, display, sriov, various hw block drivers - use drm/sched helper - mark some debug module options as unsafe - amdkfd: mark some debug module options as unsafe, trap handler updates, fix partial migration handling DRM core: - fix fbdev Kconfig select rules, improve tiled-based display support" * tag 'drm-next-2025-01-27' of https://gitlab.freedesktop.org/drm/kernel: (40 commits) drm/amd/display: Optimize cursor position updates drm/amd/display: Add hubp cache reset when powergating drm/amd/amdgpu: Enable scratch data dump for mes 12 drm/amd: Clarify kdoc for amdgpu.gttsize drm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculation drm/amd/display: Fix error pointers in amdgpu_dm_crtc_mem_type_changed drm/amdgpu: fix ring timeout issue in gfx10 sr-iov environment drm/amd/pm: Fix smu v13.0.6 caps initialization drm/amd/pm: Refactor SMU 13.0.6 SDMA reset firmware version checks revert "drm/amdgpu/pm: add definition PPSMC_MSG_ResetSDMA2" revert "drm/amdgpu/pm: Implement SDMA queue reset for different asic" drm/amd/pm: Add capability flags for SMU v13.0.6 drm/amd/display: fix SUBVP DC_DEBUG_MASK documentation drm/amd/display: fix CEC DC_DEBUG_MASK documentation drm/amdgpu: fix the PCIe lanes reporting in the INFO IOCTL drm/amdgpu: cache gpu pcie link width drm/amd/display: mark static functions noinline_for_stack drm/amdkfd: Clear MODE.VSKIP in gfx9 trap handler drm/amdgpu: Refine ip detection log message drm/amdgpu: Add handler for SDMA context empty ...
2025-01-26Merge tag 'mm-stable-2025-01-26-14-59' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated: https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED) - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation - "Cleanup for memfd_create()" from Isaac Manjarres does that thing - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests" * tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm/compaction: fix UBSAN shift-out-of-bounds warning s390/mm: add missing ctor/dtor on page table upgrade kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() tools: add VM_WARN_ON_VMG definition mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() seqlock: add missing parameter documentation for raw_seqcount_try_begin() mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh mm/page_alloc: remove the incorrect and misleading comment zram: remove zcomp_stream_put() from write_incompressible_page() mm: separate move/undo parts from migrate_pages_batch() mm/kfence: use str_write_read() helper in get_access_type() selftests/mm/mkdirty: fix memory leak in test_uffdio_copy() kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags() selftests/mm: virtual_address_range: avoid reading from VM_IO mappings selftests/mm: vm_util: split up /proc/self/smaps parsing selftests/mm: virtual_address_range: unmap chunks after validation selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/memfd/memfd_test: fix possible NULL pointer dereference mm: add FGP_DONTCACHE folio creation flag mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue ...
2025-01-26Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly individually changelogged singleton patches. The patch series in this pull are: - "lib min_heap: Improve min_heap safety, testing, and documentation" from Kuan-Wei Chiu provides various tightenings to the min_heap library code - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some cleanup and Rust preparation in the xarray library code - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes pathnames in some code comments - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the new secs_to_jiffies() in various places where that is appropriate - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen switches two filesystems to the new mount API - "Convert ocfs2 to use folios" from Matthew Wilcox does that - "Remove get_task_comm() and print task comm directly" from Yafang Shao removes now-unneeded calls to get_task_comm() in various places - "squashfs: reduce memory usage and update docs" from Phillip Lougher implements some memory savings in squashfs and performs some maintainability work - "lib: clarify comparison function requirements" from Kuan-Wei Chiu tightens the sort code's behaviour and adds some maintenance work - "nilfs2: protect busy buffer heads from being force-cleared" from Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a corrupted image - "nilfs2: fix kernel-doc comments for function return values" from Ryusuke Konishi fixes some nilfs kerneldoc - "nilfs2: fix issues with rename operations" from Ryusuke Konishi addresses some nilfs BUG_ONs which syzbot was able to trigger - "minmax.h: Cleanups and minor optimisations" from David Laight does some maintenance work on the min/max library code - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work on the xarray library code" * tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits) ocfs2: use str_yes_no() and str_no_yes() helper functions include/linux/lz4.h: add some missing macros Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent Xarray: remove repeat check in xas_squash_marks() Xarray: distinguish large entries correctly in xas_split_alloc() Xarray: move forward index correctly in xas_pause() Xarray: do not return sibling entries from xas_find_marked() ipc/util.c: complete the kernel-doc function descriptions gcov: clang: use correct function param names latencytop: use correct kernel-doc format for func params minmax.h: remove some #defines that are only expanded once minmax.h: simplify the variants of clamp() minmax.h: move all the clamp() definitions after the min/max() ones minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp() minmax.h: reduce the #define expansion of min(), max() and clamp() minmax.h: update some comments minmax.h: add whitespace around operators and after commas nilfs2: do not update mtime of renamed directory that is not moved nilfs2: handle errors that nilfs_prepare_chunk() may return CREDITS: fix spelling mistake ...
2025-01-25mm/memblock: add memblock_alloc_or_panic interfaceGuo Weikang
Before SLUB initialization, various subsystems used memblock_alloc to allocate memory. In most cases, when memory allocation fails, an immediate panic is required. To simplify this behavior and reduce repetitive checks, introduce `memblock_alloc_or_panic`. This function ensures that memory allocation failures result in a panic automatically, improving code readability and consistency across subsystems that require this behavior. [guoweikang.kernel@gmail.com: arch/s390: save_area_alloc default failure behavior changed to panic] Link: https://lkml.kernel.org/r/20250109033136.2845676-1-guoweikang.kernel@gmail.com Link: https://lore.kernel.org/lkml/Z2fknmnNtiZbCc7x@kernel.org/ Link: https://lkml.kernel.org/r/20250102072528.650926-1-guoweikang.kernel@gmail.com Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24cgroup/cpuset: Move procfs cpuset attribute under cgroup-v1.cMichal Koutný
The cpuset file is a legacy attribute that is bound primarily to cpuset v1 hierarchy (equivalent information is available in /proc/$pid/cgroup path on the unified hierarchy in conjunction with respective cgroup.controllers showing where cpuset controller is enabled). Followup to commit b0ced9d378d49 ("cgroup/cpuset: move v1 interfaces to cpuset-v1.c") and hide CONFIG_PROC_PID_CPUSET under CONFIG_CPUSETS_V1. Drop an obsolete comment too. Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-01-21Merge tag 'rust-6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Finish the move to custom FFI integer types started in the previous cycle and finally map 'long' to 'isize' and 'char' to 'u8'. Do a few cleanups on top thanks to that. - Start to use 'derive(CoercePointee)' on Rust >= 1.84.0. This is a major milestone on the path to build the kernel using only stable Rust features. In particular, previously we were using the unstable features 'coerce_unsized', 'dispatch_from_dyn' and 'unsize', and now we will use the new 'derive_coerce_pointee' one, which is on track to stabilization. This new feature is a macro that essentially expands into code that internally uses the unstable features that we were using before, without having to expose those. With it, stable Rust users, including the kernel, will be able to build custom smart pointers that work with trait objects, e.g.: fn f(p: &Arc<dyn Display>) { pr_info!("{p}\n"); } let a: Arc<dyn Display> = Arc::new(42i32, GFP_KERNEL)?; let b: Arc<dyn Display> = Arc::new("hello there", GFP_KERNEL)?; f(&a); // Prints "42". f(&b); // Prints "hello there". Together with the 'arbitrary_self_types' feature that we started using in the previous cycle, using our custom smart pointers like 'Arc' will eventually only rely in stable Rust. - Introduce 'PROCMACROLDFLAGS' environment variable to allow to link Rust proc macros using different flags than those used for linking Rust host programs (e.g. when 'rustc' uses a different C library than the host programs' one), which Android needs. - Help kernel builds under macOS with Rust enabled by accomodating other naming conventions for dynamic libraries (i.e. '.so' vs. '.dylib') which are used for Rust procedural macros. The actual support for macOS (i.e. the rest of the pieces needed) is provided out-of-tree by others, following the policy used for other parts of the kernel by Kbuild. - Run Clippy for 'rusttest' code too and clean the bits it spotted. - Provide Clippy with the minimum supported Rust version to improve the suggestions it gives. - Document 'bindgen' 0.71.0 regression. 'kernel' crate: - 'build_error!': move users of the hidden function to the documented macro, prevent such uses in the future by moving the function elsewhere and add the macro to the prelude. - 'types' module: add improved version of 'ForeignOwnable::borrow_mut' (which was removed in the past since it was problematic); change 'ForeignOwnable' pointer type to '*mut'. - 'alloc' module: implement 'Display' for 'Box' and align the 'Debug' implementation to it; add example (doctest) for 'ArrayLayout::new()' - 'sync' module: document 'PhantomData' in 'Arc'; use 'NonNull::new_unchecked' in 'ForeignOwnable for Arc' impl. - 'uaccess' module: accept 'Vec's with different allocators in 'UserSliceReader::read_all'. - 'workqueue' module: enable run-testing a couple more doctests. - 'error' module: simplify 'from_errno()'. - 'block' module: fix formatting in code documentation (a lint to catch these is being implemented). - Avoid 'unwrap()'s in doctests, which also improves the examples by showing how kernel code is supposed to be written. - Avoid 'as' casts with 'cast{,_mut}' calls which are a bit safer. And a few other cleanups" * tag 'rust-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (32 commits) kbuild: rust: add PROCMACROLDFLAGS rust: uaccess: generalize userSliceReader to support any Vec rust: kernel: add improved version of `ForeignOwnable::borrow_mut` rust: kernel: reorder `ForeignOwnable` items rust: kernel: change `ForeignOwnable` pointer to mut rust: arc: split unsafe block, add missing comment rust: types: avoid `as` casts rust: arc: use `NonNull::new_unchecked` rust: use derive(CoercePointee) on rustc >= 1.84.0 rust: alloc: add doctest for `ArrayLayout::new()` rust: init: update `stack_try_pin_init` examples rust: error: import `kernel`'s `LayoutError` instead of `core`'s rust: str: replace unwraps with question mark operators rust: page: remove unnecessary helper function from doctest rust: rbtree: remove unwrap in asserts rust: init: replace unwraps with question mark operators rust: use host dylib naming convention to support macOS rust: add `build_error!` to the prelude rust: kernel: move `build_error` hidden function to prevent mistakes rust: use the `build_error!` macro, not the hidden function ...
2025-01-21Merge tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "There are two external interactions of note, the msm tree pull in some opp tree, hopefully the opp tree arrives from the same git tree however it normally does. There is also a new cgroup controller for device memory, that is used by drm, so is merging through my tree. This will hopefully help open up gpu cgroup usage a bit more and move us forward. There is a new accelerator driver for the AMD XDNA Ryzen AI NPUs. Then the usual xe/amdgpu/i915/msm leaders and lots of changes and refactors across the board: core: - device memory cgroup controller added - Remove driver date from drm_driver - Add drm_printer based hex dumper - drm memory stats docs update - scheduler documentation improvements new driver: - amdxdna - Ryzen AI NPU support connector: - add a mutex to protect ELD - make connector setup two-step panels: - Introduce backlight quirks infrastructure - New panels: KDB KD116N2130B12, Tianma TM070JDHG34-00, - Multi-Inno Technology MI1010Z1T-1CP11 bridge: - ti-sn65dsi83: Add ti,lvds-vod-swing optional properties - Provide default implementation of atomic_check for HDMI bridges - it605: HDCP improvements, MCCS Support xe: - make OA buffer size configurable - GuC capture fixes - add ufence and g2h flushes - restore system memory GGTT mappings - ioctl fixes - SRIOV PF scheduling priority - allow fault injection - lots of improvements/refactors - Enable GuC's WA_DUAL_QUEUE for newer platforms - IRQ related fixes and improvements i915: - More accurate engine busyness metrics with GuC submission - Ensure partial BO segment offset never exceeds allowed max - Flush GuC CT receive tasklet during reset preparation - Some DG2 refactor to fix DG2 bugs when operating with certain CPUs - Fix DG1 power gate sequence - Enabling uncompressed 128b/132b UHBR SST - Handle hdmi connector init failures, and no HDMI/DP cases - More robust engine resets on Haswell and older i915/xe display: - HDCP fixes for Xe3Lpd - New GSC FW ARL-H/ARL-U - support 3 VDSC engines 12 slices - MBUS joining sanitisation - reconcile i915/xe display power mgmt - Xe3Lpd fixes - UHBR rates for Thunderbolt amdgpu: - DRM panic support - track BO memory stats at runtime - Fix max surface handling in DC - Cleaner shader support for gfx10.3 dGPUs - fix drm buddy trim handling - SDMA engine reset updates - Fix doorbell ttm cleanup - RAS updates - ISP updates - SDMA queue reset support - Rework DPM powergating interfaces - Documentation updates and cleanups - DCN 3.5 updates - Use a pm notifier to more gracefully handle VRAM eviction on suspend or hibernate - Add debugfs interfaces for forcing scheduling to specific engine instances - GG 9.5 updates - IH 4.4 updates - Make missing optional firmware less noisy - PSP 13.x updates - SMU 13.x updates - VCN 5.x updates - JPEG 5.x updates - GC 12.x updates - DC FAMS updates amdkfd: - GG 9.5 updates - Logging improvements - Shader debugger fixes - Trap handler cleanup - Cleanup includes - Eviction fence wq fix msm: - MDSS: - properly described UBWC registers - added SM6150 (aka QCS615) support - DPU: - added SM6150 (aka QCS615) support - enabled wide planes if virtual planes are enabled (by using two SSPPs for a single plane) - added CWB hardware blocks support - DSI: - added SM6150 (aka QCS615) support - GPU: - Print GMU core fw version - GMU bandwidth voting for a740 and a750 - Expose uche trap base via uapi - UAPI error reporting rcar-du: - Add r8a779h0 Support ivpu: - Fix qemu crash when using passthrough nouveau: - expose GSP-RM logging buffers via debugfs panfrost: - Add MT8188 Mali-G57 MC3 support rockchip: - Gamma LUT support hisilicon: - new HIBMC support virtio-gpu: - convert to helpers - add prime support for scanout buffers v3d: - Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL vc4: - Add support for BCM2712 vkms: - line-per-line compositing algorithm to improve performance zynqmp: - Add DP audio support mediatek: - dp: Add sdp path reset - dp: Support flexible length of DP calibration data etnaviv: - add fdinfo memory support - add explicit reset handling" * tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel: (1070 commits) drm/bridge: fix documentation for the hdmi_audio_prepare() callback doc/cgroup: Fix title underline length drm/doc: Include new drm-compute documentation cgroup/dmem: Fix parameters documentation cgroup/dmem: Select PAGE_COUNTER kernel/cgroup: Remove the unused variable climit drm/display: hdmi: Do not read EDID on disconnected connectors drm/tests: hdmi: Add connector disablement test drm/connector: hdmi: Do atomic check when necessary drm/amd/display: 3.2.316 drm/amd/display: avoid reset DTBCLK at clock init drm/amd/display: improve dpia pre-train drm/amd/display: Apply DML21 Patches drm/amd/display: Use HW lock mgr for PSR1 drm/amd/display: Revised for Replay Pseudo vblank control drm/amd/display: Add a new flag for replay low hz drm/amd/display: Remove unused read_ono_state function from Hwss module drm/amd/display: Do not elevate mem_type change to full update drm/amd/display: Do not wait for PSR disable on vbl enable drm/amd/display: Remove unnecessary eDP power down ...
2025-01-18rust: Use gendwarfksyms + extended modversions for CONFIG_MODVERSIONSSami Tolvanen
Previously, two things stopped Rust from using MODVERSIONS: 1. Rust symbols are occasionally too long to be represented in the original versions table 2. Rust types cannot be properly hashed by the existing genksyms approach because: * Looking up type definitions in Rust is more complex than C * Type layout is potentially dependent on the compiler in Rust, not just the source type declaration. CONFIG_EXTENDED_MODVERSIONS addresses the first point, and CONFIG_GENDWARFKSYMS the second. If Rust wants to use MODVERSIONS, allow it to do so by selecting both features. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Co-developed-by: Matthew Maurer <mmaurer@google.com> Signed-off-by: Matthew Maurer <mmaurer@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-17cgroup/rdma: Drop bogus PAGE_COUNTER selectGeert Uytterhoeven
When adding the Device memory controller (DMEM), "select PAGE_COUNTER" was added to CGROUP_RDMA, presumably instead of CGROUP_DMEM. While commit e33b51499a0a6bca ("cgroup/dmem: Select PAGE_COUNTER") added the missing select to CGROUP_DMEM, the bogus select is still there. Remove it. Fixes: b168ed458ddecc17 ("kernel/cgroup: Add "dmem" memory accounting cgroup") Closes: https://lore.kernel.org/CAMuHMdUmPfahsnZwx2iB5yfh8rjjW25LNcnYujNBgcKotUXBNg@mail.gmail.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Tejun Heo <tj@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/b4d462f038a2f895f30ae759928397c8183f6f7e.1737020925.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-15cgroup/dmem: Select PAGE_COUNTERMaxime Ripard
The dmem cgroup the page counting API implemented behing the PAGE_COUNTER kconfig option. However, it doesn't select it, resulting in potential build breakages. Select PAGE_COUNTER. Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501111330.3VuUx8vf-lkp@intel.com/ Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20250113092608.1349287-1-mripard@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-13rust: use derive(CoercePointee) on rustc >= 1.84.0Xiangfei Ding
The `kernel` crate relies on both `coerce_unsized` and `dispatch_from_dyn` unstable features. Alice Ryhl has proposed [1] the introduction of the unstable macro `SmartPointer` to reduce such dependence, along with a RFC patch [2]. Since Rust 1.81.0 this macro, later renamed to `CoercePointee` in Rust 1.84.0 [3], has been fully implemented with the naming discussion resolved. This feature is now on track to stabilization in the language. In order to do so, we shall start using this macro in the `kernel` crate to prove the functionality and utility of the macro as the justification of its stabilization. This patch makes this switch in such a way that the crate remains backward compatible with older Rust compiler versions, via the new Kconfig option `RUSTC_HAS_COERCE_POINTEE`. A minimal demonstration example is added to the `samples/rust/rust_print_main.rs` module. Link: https://rust-lang.github.io/rfcs/3621-derive-smart-pointer.html [1] Link: https://lore.kernel.org/all/20240823-derive-smart-pointer-v1-1-53769cd37239@google.com/ [2] Link: https://github.com/rust-lang/rust/pull/131284 [3] Signed-off-by: Xiangfei Ding <dingxiangfei2009@gmail.com> Reviewed-by: Fiona Behrens <me@kloenk.dev> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20241203205050.679106-2-dingxiangfei2009@gmail.com [ Fixed version to 1.84. Renamed option to `RUSTC_HAS_COERCE_POINTEE` to match `CC_HAS_*` ones. Moved up new config option, closer to the `CC_HAS_*` ones. Simplified Kconfig line. Fixed typos and slightly reworded example and commit. Added Link to PR. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-01-12init: fix removal warning for deprecated initrd loadingMartin Kepplinger
This won't be removed in 2021, no matter how hard we try. Link: https://lkml.kernel.org/r/20241218123638.34907-1-martink@posteo.de Signed-off-by: Martin Kepplinger <martink@posteo.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Joel Granados <joel.granados@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-11rcu/kvfree: Initialize kvfree_rcu() separatelyUladzislau Rezki (Sony)
Introduce a separate initialization of kvfree_rcu() functionality. For such purpose a kfree_rcu_batch_init() is renamed to a kvfree_rcu_init() and it is invoked from the main.c right after rcu_init() is done. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com> Tested-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-01-10rust: document `bindgen` 0.71.0 regressionMiguel Ojeda
`bindgen` 0.71.0 regressed [1] on the "`--version` requires header" issue which appeared in 0.69.0 first [2] and was fixed in 0.69.1. It has been fixed again in 0.71.1 [3]. Thus document it so that, when we upgrade the minimum past 0.69.0 in the future, we do not forget that we cannot remove the workaround until we arrive at 0.71.1 at least. Link: https://github.com/rust-lang/rust-bindgen/issues/3039 [1] Link: https://github.com/rust-lang/rust-bindgen/issues/2677 [2] Link: https://github.com/rust-lang/rust-bindgen/blob/main/CHANGELOG.md#v0711-2024-12-09 [3] Reviewed-by: Fiona Behrens <me@kloenk.dev> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20241209212544.1977065-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-01-06kernel/cgroup: Add "dmem" memory accounting cgroupMaarten Lankhorst
This code is based on the RDMA and misc cgroup initially, but now uses page_counter. It uses the same min/low/max semantics as the memory cgroup as a result. There's a small mismatch as TTM uses u64, and page_counter long pages. In practice it's not a problem. 32-bits systems don't really come with >=4GB cards and as long as we're consistently wrong with units, it's fine. The device page size may not be in the same units as kernel page size, and each region might also have a different page size (VRAM vs GART for example). The interface is simple: - Call dmem_cgroup_register_region() - Use dmem_cgroup_try_charge to check if you can allocate a chunk of memory, use dmem_cgroup__uncharge when freeing it. This may return an error code, or -EAGAIN when the cgroup limit is reached. In that case a reference to the limiting pool is returned. - The limiting cs can be used as compare function for dmem_cgroup_state_evict_valuable. - After having evicted enough, drop reference to limiting cs with dmem_cgroup_pool_state_put. This API allows you to limit device resources with cgroups. You can see the supported cards in /sys/fs/cgroup/dmem.capacity You need to echo +dmem to cgroup.subtree_control, and then you can partition device memory. Co-developed-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Co-developed-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20241204143112.1250983-1-dev@lankhorst.se Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-11-25Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - The series "resource: A couple of cleanups" from Andy Shevchenko performs some cleanups in the resource management code - The series "Improve the copy of task comm" from Yafang Shao addresses possible race-induced overflows in the management of task_struct.comm[] - The series "Remove unnecessary header includes from {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a small fix to the list_sort library code and to its selftest - The series "Enhance min heap API with non-inline functions and optimizations" also from Kuan-Wei Chiu optimizes and cleans up the min_heap library code - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi finishes off nilfs2's folioification - The series "add detect count for hung tasks" from Lance Yang adds more userspace visibility into the hung-task detector's activity - Apart from that, singelton patches in many places - please see the individual changelogs for details * tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) gdb: lx-symbols: do not error out on monolithic build kernel/reboot: replace sprintf() with sysfs_emit() lib: util_macros_kunit: add kunit test for util_macros.h util_macros.h: fix/rework find_closest() macros Improve consistency of '#error' directive messages ocfs2: fix uninitialized value in ocfs2_file_read_iter() hung_task: add docs for hung_task_detect_count hung_task: add detect count for hung tasks dma-buf: use atomic64_inc_return() in dma_buf_getfile() fs/proc/kcore.c: fix coccinelle reported ERROR instances resource: avoid unnecessary resource tree walking in __region_intersects() ocfs2: remove unused errmsg function and table ocfs2: cluster: fix a typo lib/scatterlist: use sg_phys() helper checkpatch: always parse orig_commit in fixes tag nilfs2: convert metadata aops from writepage to writepages nilfs2: convert nilfs_recovery_copy_block() to take a folio nilfs2: convert nilfs_page_count_clean_buffers() to take a folio nilfs2: remove nilfs_writepage nilfs2: convert checkpoint file to be folio-based ...
2024-11-25Merge tag 'hardening-v6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - Disable __counted_by in Clang < 19.1.3 (Jan Hendrik Farr) - string_helpers: Silence output truncation warning (Bartosz Golaszewski) - compiler.h: Avoid needing BUILD_BUG_ON_ZERO() (Philipp Reisner) - MAINTAINERS: Add kernel hardening keywords __counted_by{_le|_be} (Thorsten Blum) * tag 'hardening-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: Compiler Attributes: disable __counted_by for clang < 19.1.3 compiler.h: Fix undefined BUILD_BUG_ON_ZERO() lib: string_helpers: silence snprintf() output truncation warning MAINTAINERS: Add kernel hardening keywords __counted_by{_le|_be}
2024-11-22Merge tag 'trace-v6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Addition of faultable tracepoints There's a tracepoint attached to both a system call entry and exit. This location is known to allow page faults. The tracepoints are called under an rcu_read_lock() which does not allow faults that can sleep. This limits the ability of tracepoint handlers to page fault in user space system call parameters. Now these tracepoints have been made "faultable", allowing the callbacks to fault in user space parameters and record them. Note, only the infrastructure has been implemented. The consumers (perf, ftrace, BPF) now need to have their code modified to allow faults. - Fix up of BPF code for the tracepoint faultable logic - Update tracepoints to use the new static branch API - Remove trace_*_rcuidle() variants and the SRCU protection they used - Remove unused TRACE_EVENT_FL_FILTERED logic - Replace strncpy() with strscpy() and memcpy() - Use replace per_cpu_ptr(smp_processor_id()) with this_cpu_ptr() - Fix perf events to not duplicate samples when tracing is enabled - Replace atomic64_add_return(1, counter) with atomic64_inc_return(counter) - Make stack trace buffer 4K instead of PAGE_SIZE - Remove TRACE_FLAG_IRQS_NOSUPPORT flag as it was never used - Get the true return address for function tracer when function graph tracer is also running. When function_graph trace is running along with function tracer, the parent function of the function tracer sometimes is "return_to_handler", which is the function graph trampoline to record the exit of the function. Use existing logic that calls into the fgraph infrastructure to find the real return address. - Remove (un)regfunc pointers out of tracepoint structure - Added last minute bug fix for setting pending modules in stack function filter. echo "write*:mod:ext3" > /sys/kernel/tracing/stack_trace_filter Would cause a kernel NULL dereference. - Minor clean ups * tag 'trace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits) ftrace: Fix regression with module command in stack_trace_filter tracing: Fix function name for trampoline ftrace: Get the true parent ip for function tracer tracing: Remove redundant check on field->field in histograms bpf: ensure RCU Tasks Trace GP for sleepable raw tracepoint BPF links bpf: decouple BPF link/attach hook and BPF program sleepable semantics bpf: put bpf_link's program when link is safe to be deallocated tracing: Replace strncpy() with strscpy() when copying comm tracing: Add might_fault() check in __DECLARE_TRACE_SYSCALL tracing: Fix syscall tracepoint use-after-free tracing: Introduce tracepoint_is_faultable() tracing: Introduce tracepoint extended structure tracing: Remove TRACE_FLAG_IRQS_NOSUPPORT tracing: Replace multiple deprecated strncpy with memcpy tracing: Make percpu stack trace buffer invariant to PAGE_SIZE tracing: Use atomic64_inc_return() in trace_clock_counter() trace/trace_event_perf: remove duplicate samples on the first tracepoint event tracing/bpf: Add might_fault check to syscall probes tracing/perf: Add might_fault check to syscall probes tracing/ftrace: Add might_fault check to syscall probes ...
2024-11-20Merge tag 'printk-for-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Print more precise information about the printk log buffer memory usage. - Make sure that the sysrq title is shown on the console even when deferred. - Do not enable earlycon by `console=` which is meant to disable the default console. * tag 'printk-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: add dummy printk_force_console_enter/exit helpers tty: sysrq: Use printk_force_console context on __handle_sysrq printk: Introduce FORCE_CON flag printk: Improve memory usage logging during boot init: Don't proxy `console=` to earlycon
2024-11-19Merge tag 'timers-core-2024-11-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A rather large update for timekeeping and timers: - The final step to get rid of auto-rearming posix-timers posix-timers are currently auto-rearmed by the kernel when the signal of the timer is ignored so that the timer signal can be delivered once the corresponding signal is unignored. This requires to throttle the timer to prevent a DoS by small intervals and keeps the system pointlessly out of low power states for no value. This is a long standing non-trivial problem due to the lock order of posix-timer lock and the sighand lock along with life time issues as the timer and the sigqueue have different life time rules. Cure this by: - Embedding the sigqueue into the timer struct to have the same life time rules. Aside of that this also avoids the lookup of the timer in the signal delivery and rearm path as it's just a always valid container_of() now. - Queuing ignored timer signals onto a seperate ignored list. - Moving queued timer signals onto the ignored list when the signal is switched to SIG_IGN before it could be delivered. - Walking the ignored list when SIG_IGN is lifted and requeue the signals to the actual signal lists. This allows the signal delivery code to rearm the timer. This also required to consolidate the signal delivery rules so they are consistent across all situations. With that all self test scenarios finally succeed. - Core infrastructure for VFS multigrain timestamping This is required to allow the kernel to use coarse grained time stamps by default and switch to fine grained time stamps when inode attributes are actively observed via getattr(). These changes have been provided to the VFS tree as well, so that the VFS specific infrastructure could be built on top. - Cleanup and consolidation of the sleep() infrastructure - Move all sleep and timeout functions into one file - Rework udelay() and ndelay() into proper documented inline functions and replace the hardcoded magic numbers by proper defines. - Rework the fsleep() implementation to take the reality of the timer wheel granularity on different HZ values into account. Right now the boundaries are hard coded time ranges which fail to provide the requested accuracy on different HZ settings. - Update documentation for all sleep/timeout related functions and fix up stale documentation links all over the place - Fixup a few usage sites - Rework of timekeeping and adjtimex(2) to prepare for multiple PTP clocks A system can have multiple PTP clocks which are participating in seperate and independent PTP clock domains. So far the kernel only considers the PTP clock which is based on CLOCK TAI relevant as that's the clock which drives the timekeeping adjustments via the various user space daemons through adjtimex(2). The non TAI based clock domains are accessible via the file descriptor based posix clocks, but their usability is very limited. They can't be accessed fast as they always go all the way out to the hardware and they cannot be utilized in the kernel itself. As Time Sensitive Networking (TSN) gains traction it is required to provide fast user and kernel space access to these clocks. The approach taken is to utilize the timekeeping and adjtimex(2) infrastructure to provide this access in a similar way how the kernel provides access to clock MONOTONIC, REALTIME etc. Instead of creating a duplicated infrastructure this rework converts timekeeping and adjtimex(2) into generic functionality which operates on pointers to data structures instead of using static variables. This allows to provide time accessors and adjtimex(2) functionality for the independent PTP clocks in a subsequent step. - Consolidate hrtimer initialization hrtimers are set up by initializing the data structure and then seperately setting the callback function for historical reasons. That's an extra unnecessary step and makes Rust support less straight forward than it should be. Provide a new set of hrtimer_setup*() functions and convert the core code and a few usage sites of the less frequently used interfaces over. The bulk of the htimer_init() to hrtimer_setup() conversion is already prepared and scheduled for the next merge window. - Drivers: - Ensure that the global timekeeping clocksource is utilizing the cluster 0 timer on MIPS multi-cluster systems. Otherwise CPUs on different clusters use their cluster specific clocksource which is not guaranteed to be synchronized with other clusters. - Mostly boring cleanups, fixes, improvements and code movement" * tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits) posix-timers: Fix spurious warning on double enqueue versus do_exit() clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties clocksource/drivers/gpx: Remove redundant casts clocksource/drivers/timer-ti-dm: Fix child node refcount handling dt-bindings: timer: actions,owl-timer: convert to YAML clocksource/drivers/ralink: Add Ralink System Tick Counter driver clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource clocksource/drivers/timer-ti-dm: Don't fail probe if int not found clocksource/drivers:sp804: Make user selectable clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions hrtimers: Delete hrtimer_init_on_stack() alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack() io_uring: Switch to use hrtimer_setup_on_stack() sched/idle: Switch to use hrtimer_setup_on_stack() hrtimers: Delete hrtimer_init_sleeper_on_stack() wait: Switch to use hrtimer_setup_sleeper_on_stack() timers: Switch to use hrtimer_setup_sleeper_on_stack() net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack() futex: Switch to use hrtimer_setup_sleeper_on_stack() fs/aio: Switch to use hrtimer_setup_sleeper_on_stack() ...
2024-11-19Compiler Attributes: disable __counted_by for clang < 19.1.3Jan Hendrik Farr
This patch disables __counted_by for clang versions < 19.1.3 because of the two issues listed below. It does this by introducing CONFIG_CC_HAS_COUNTED_BY. 1. clang < 19.1.2 has a bug that can lead to __bdos returning 0: https://github.com/llvm/llvm-project/pull/110497 2. clang < 19.1.3 has a bug that can lead to __bdos being off by 4: https://github.com/llvm/llvm-project/pull/112636 Fixes: c8248faf3ca2 ("Compiler Attributes: counted_by: Adjust name and identifier expansion") Cc: stable@vger.kernel.org # 6.6.x: 16c31dd7fdf6: Compiler Attributes: counted_by: bump min gcc version Cc: stable@vger.kernel.org # 6.6.x: 2993eb7a8d34: Compiler Attributes: counted_by: fixup clang URL Cc: stable@vger.kernel.org # 6.6.x: 231dc3f0c936: lkdtm/bugs: Improve warning message for compilers without counted_by support Cc: stable@vger.kernel.org # 6.6.x Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/all/20240913164630.GA4091534@thelio-3990X/ Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202409260949.a1254989-oliver.sang@intel.com Link: https://lore.kernel.org/all/Zw8iawAF5W2uzGuh@archlinux/T/#m204c09f63c076586a02d194b87dffc7e81b8de7b Suggested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Jan Hendrik Farr <kernel@jfarr.cc> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20241029140036.577804-2-kernel@jfarr.cc Signed-off-by: Kees Cook <kees@kernel.org>
2024-11-18Merge tag 'vfs-6.13.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Fixup and improve NLM and kNFSD file lock callbacks Last year both GFS2 and OCFS2 had some work done to make their locking more robust when exported over NFS. Unfortunately, part of that work caused both NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send lock notifications to clients This in itself is not a huge problem because most NFS clients will still poll the server in order to acquire a conflicted lock It's important for NLM and kNFSD that they do not block their kernel threads inside filesystem's file_lock implementations because that can produce deadlocks. We used to make sure of this by only trusting that posix_lock_file() can correctly handle blocking lock calls asynchronously, so the lock managers would only setup their file_lock requests for async callbacks if the filesystem did not define its own lock() file operation However, when GFS2 and OCFS2 grew the capability to correctly handle blocking lock requests asynchronously, they started signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting posix_lock_file() was inadvertently dropped, so now most filesystems no longer produce lock notifications when exported over NFS Fix this by using an fop_flag which greatly simplifies the problem and grooms the way for future uses by both filesystems and lock managers alike - Add a sysctl to delete the dentry when a file is removed instead of making it a negative dentry Commit 681ce8623567 ("vfs: Delete the associated dentry when deleting a file") introduced an unconditional deletion of the associated dentry when a file is removed. However, this led to performance regressions in specific benchmarks, such as ilebench.sum_operations/s, prompting a revert in commit 4a4be1ad3a6e ("Revert "vfs: Delete the associated dentry when deleting a file""). This reintroduces the concept conditionally through a sysctl - Expand the statmount() system call: * Report the filesystem subtype in a new fs_subtype field to e.g., report fuse filesystem subtypes * Report the superblock source in a new sb_source field * Add a new way to return filesystem specific mount options in an option array that returns filesystem specific mount options separated by zero bytes and unescaped. This allows caller's to retrieve filesystem specific mount options and immediately pass them to e.g., fsconfig() without having to unescape or split them * Report security (LSM) specific mount options in a separate security option array. We don't lump them together with filesystem specific mount options as security mount options are generic and most users aren't interested in them The format is the same as for the filesystem specific mount option array - Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command - Optimize acl_permission_check() to avoid costly {g,u}id ownership checks if possible - Use smp_mb__after_spinlock() to avoid full smp_mb() in evict() - Add synchronous wakeup support for ep_poll_callback. Currently, epoll only uses wake_up() to wake up task. But sometimes there are epoll users which want to use the synchronous wakeup flag to give a hint to the scheduler, e.g., the Android binder driver. So add a wake_up_sync() define, and use wake_up_sync() when sync is true in ep_poll_callback() Fixes: - Fix kernel documentation for inode_insert5() and iget5_locked() - Annotate racy epoll check on file->f_ep - Make F_DUPFD_QUERY associative - Avoid filename buffer overrun in initramfs - Don't let statmount() return empty strings - Add a cond_resched() to dump_user_range() to avoid hogging the CPU - Don't query the device logical blocksize multiple times for hfsplus - Make filemap_read() check that the offset is positive or zero Cleanups: - Various typo fixes - Cleanup wbc_attach_fdatawrite_inode() - Add __releases annotation to wbc_attach_and_unlock_inode() - Add hugetlbfs tracepoints - Fix various vfs kernel doc parameters - Remove obsolete TODO comment from io_cancel() - Convert wbc_account_cgroup_owner() to take a folio - Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add() - Reorder struct posix_acl to save 8 bytes - Annotate struct posix_acl with __counted_by() - Replace one-element array with flexible array member in freevxfs - Use idiomatic atomic64_inc_return() in alloc_mnt_ns()" * tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits) statmount: retrieve security mount options vfs: make evict() use smp_mb__after_spinlock instead of smp_mb statmount: add flag to retrieve unescaped options fs: add the ability for statmount() to report the sb_source writeback: wbc_attach_fdatawrite_inode out of line writeback: add a __releases annoation to wbc_attach_and_unlock_inode fs: add the ability for statmount() to report the fs_subtype fs: don't let statmount return empty strings fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel() hfsplus: don't query the device logical block size multiple times freevxfs: Replace one-element array with flexible array member fs: optimize acl_permission_check() initramfs: avoid filename buffer overrun fs/writeback: convert wbc_account_cgroup_owner to take a folio acl: Annotate struct posix_acl with __counted_by() acl: Realign struct posix_acl to save 8 bytes epoll: Add synchronous wakeup support for ep_poll_callback coredump: add cond_resched() to dump_user_range mm/page-writeback.c: Fix comment of wb_domain_writeout_add() mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL ...
2024-11-07signal: Provide ignored_posix_timers listThomas Gleixner
To prepare for handling posix timer signals on sigaction(SIG_IGN) properly, add a list to task::signal. This list will be used to queue posix timers so their signal can be requeued when SIG_IGN is lifted later. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20241105064213.920101900@linutronix.de
2024-11-05lib/Makefile: make union-find compilation conditional on CONFIG_CPUSETSKuan-Wei Chiu
Currently, cpuset is the only user of the union-find implementation. Compiling union-find in all configurations unnecessarily increases the code size when building the kernel without cgroup support. Modify the build system to compile union-find only when CONFIG_CPUSETS is enabled. Link: https://lore.kernel.org/lkml/1ccd6411-5002-4574-bb8e-3e64bba6a757@redhat.com/ Link: https://lkml.kernel.org/r/20241011141214.87096-1-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Suggested-by: Waiman Long <llong@redhat.com> Acked-by: Waiman Long <longman@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Xavier <xavier_qy@163.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31initramfs: avoid filename buffer overrunDavid Disseldorp
The initramfs filename field is defined in Documentation/driver-api/early-userspace/buffer-format.rst as: 37 cpio_file := ALGN(4) + cpio_header + filename + "\0" + ALGN(4) + data ... 55 ============= ================== ========================= 56 Field name Field size Meaning 57 ============= ================== ========================= ... 70 c_namesize 8 bytes Length of filename, including final \0 When extracting an initramfs cpio archive, the kernel's do_name() path handler assumes a zero-terminated path at @collected, passing it directly to filp_open() / init_mkdir() / init_mknod(). If a specially crafted cpio entry carries a non-zero-terminated filename and is followed by uninitialized memory, then a file may be created with trailing characters that represent the uninitialized memory. The ability to create an initramfs entry would imply already having full control of the system, so the buffer overrun shouldn't be considered a security vulnerability. Append the output of the following bash script to an existing initramfs and observe any created /initramfs_test_fname_overrunAA* path. E.g. ./reproducer.sh | gzip >> /myinitramfs It's easiest to observe non-zero uninitialized memory when the output is gzipped, as it'll overflow the heap allocated @out_buf in __gunzip(), rather than the initrd_start+initrd_size block. ---- reproducer.sh ---- nilchar="A" # change to "\0" to properly zero terminate / pad magic="070701" ino=1 mode=$(( 0100777 )) uid=0 gid=0 nlink=1 mtime=1 filesize=0 devmajor=0 devminor=1 rdevmajor=0 rdevminor=0 csum=0 fname="initramfs_test_fname_overrun" namelen=$(( ${#fname} + 1 )) # plus one to account for terminator printf "%s%08x%08x%08x%08x%08x%08x%08x%08x%08x%08x%08x%08x%08x%s" \ $magic $ino $mode $uid $gid $nlink $mtime $filesize \ $devmajor $devminor $rdevmajor $rdevminor $namelen $csum $fname termpadlen=$(( 1 + ((4 - ((110 + $namelen) & 3)) % 4) )) printf "%.s${nilchar}" $(seq 1 $termpadlen) ---- reproducer.sh ---- Symlink filename fields handled in do_symlink() won't overrun past the data segment, due to the explicit zero-termination of the symlink target. Fix filename buffer overrun by aborting the initramfs FSM if any cpio entry doesn't carry a zero-terminator at the expected (name_len - 1) offset. Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20241030035509.20194-2-ddiss@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-13cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERSAlice Ryhl
The HAVE_CFI_ICALL_NORMALIZE_INTEGERS option has some tricky conditions when KASAN or GCOV are turned on, as in that case we need some clang and rustc fixes [1][2] to avoid boot failures. The intent with the current setup is that you should be able to override the check and turn on the option if your clang/rustc has the fix. However, this override does not work in practice. Thus, use the new RUSTC_LLVM_VERSION to correctly implement the check for whether the fix is available. Additionally, remove KASAN_HW_TAGS from the list of incompatible options. The CFI_ICALL_NORMALIZE_INTEGERS option is incompatible with KASAN because LLVM will emit some constructors when using KASAN that are assigned incorrect CFI tags. These constructors are emitted due to use of -fsanitize=kernel-address or -fsanitize=kernel-hwaddress that are respectively passed when KASAN_GENERIC or KASAN_SW_TAGS are enabled. However, the KASAN_HW_TAGS option relies on hardware support for MTE instead and does not pass either flag. (Note also that KASAN_HW_TAGS does not `select CONSTRUCTORS`.) Link: https://github.com/llvm/llvm-project/pull/104826 [1] Link: https://github.com/rust-lang/rust/pull/129373 [2] Fixes: 4c66f8307ac0 ("cfi: encode cfi normalized integers + kasan/gcov bug in Kconfig") Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20241010-icall-detect-vers-v1-2-8f114956aa88@google.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-10-13kbuild: rust: add `CONFIG_RUSTC_LLVM_VERSION`Gary Guo
Each version of Rust supports a range of LLVM versions. There are cases where we want to gate a config on the LLVM version instead of the Rust version. Normalized cfi integer tags are one example [1]. The invocation of rustc-version is being moved from init/Kconfig to scripts/Kconfig.include for consistency with cc-version. Link: https://lore.kernel.org/all/20240925-cfi-norm-kasan-fix-v1-1-0328985cdf33@google.com/ [1] Signed-off-by: Gary Guo <gary@garyguo.net> Link: https://lore.kernel.org/r/20241011114040.3900487-1-gary@garyguo.net [ Added missing `-llvm` to the Usage documentation. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-10-09tracing: Allow system call tracepoints to handle page faultsMathieu Desnoyers
Use Tasks Trace RCU to protect iteration of system call enter/exit tracepoint probes to allow those probes to handle page faults. In preparation for this change, all tracers registering to system call enter/exit tracepoints should expect those to be called with preemption enabled. This allows tracers to fault-in userspace system call arguments such as path strings within their probe callbacks. Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Link: https://lore.kernel.org/20241009010718.2050182-6-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-01init: Don't proxy `console=` to earlyconRaul E Rangel
Today we are proxying the `console=` command line args to the `param_setup_earlycon()` handler. This is done because the following are equivalent: console=uart[8250],mmio,<addr>[,options] earlycon=uart[8250],mmio,<addr>[,options] Both invocations enable an early `bootconsole`. `console=uartXXXX` is just an alias for `earlycon=uartXXXX`. In addition, when `earlycon=` (empty value) or just `earlycon` (no value) is specified on the command line, we enable the earlycon `bootconsole` specified by the SPCR table or the DT. The problem arises when `console=` (empty value) is specified on the command line. It's intention is to disable the `console`, but what happens instead is that the SPRC/DT console gets enabled. This happens because we are proxying the `console=` (empty value) parameter to the `earlycon` handler. The `earlycon` handler then sees that the parameter value is empty, so it enables the SPCR/DT `bootconsole`. This change makes it so that the `console` or `console=` parameters no longer enable the SPCR/DT `bootconsole`. I also cleans up the hack in `main.c` that would forward the `console` parameter to the `earlycon` handler. Signed-off-by: Raul E Rangel <rrangel@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240911123507.v2.1.Id08823b2f848237ae90ce5c5fa7e027e97c33ad3@changeid Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-09-26cfi: encode cfi normalized integers + kasan/gcov bug in KconfigAlice Ryhl
There is a bug in the LLVM implementation of KASAN and GCOV that makes these options incompatible with the CFI_ICALL_NORMALIZE_INTEGERS option. The bug has already been fixed in llvm/clang [1] and rustc [2]. However, Kconfig currently has no way to gate features on the LLVM version inside rustc, so we cannot write down a precise `depends on` clause in this case. Instead, a `def_bool` option is defined for whether CFI_ICALL_NORMALIZE_INTEGERS is available, and its default value is set to false when GCOV or KASAN are turned on. End users using a patched clang/rustc can turn on the HAVE_CFI_ICALL_NORMALIZE_INTEGERS option directly to override this. An alternative solution is to inspect a binary created by clang or rustc to see whether the faulty CFI tags are in the binary. This would be a precise check, but it would involve hard-coding the *hashed* version of the CFI tag. This is because there's no way to get clang or rustc to output the unhased version of the CFI tag. Relying on the precise hashing algorithm using by CFI seems too fragile, so I have not pursued this option. Besides, this kind of hack is exactly what lead to the LLVM bug in the first place. If the CFI_ICALL_NORMALIZE_INTEGERS option is used without CONFIG_RUST, then we actually can perform a precise check today: just compare the clang version number. This works since clang and llvm are always updated in lockstep. However, encoding this in Kconfig would give the HAVE_CFI_ICALL_NORMALIZE_INTEGERS option a dependency on CONFIG_RUST, which is not possible as the reverse dependency already exists. HAVE_CFI_ICALL_NORMALIZE_INTEGERS is defined to be a `def_bool` instead of `bool` to avoid asking end users whether they want to turn on the option. Turning it on explicitly is something only experts should do, so making it hard to do so is not an issue. I added a `depends on CFI_CLANG` clause to the new Kconfig option. I'm not sure whether that makes sense or not, but it doesn't seem to make a big difference. In a future kernel release, I would like to add a Kconfig option similar to CLANG_VERSION/RUSTC_VERSION for inspecting the version of the LLVM inside rustc. Once that feature lands, this logic will be replaced with a precise version check. This check is not being introduced here to avoid introducing a new _VERSION constant in a fix. Link: https://github.com/llvm/llvm-project/pull/104826 [1] Link: https://github.com/rust-lang/rust/pull/129373 [2] Fixes: ce4a2620985c ("cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202409231044.4f064459-oliver.sang@intel.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20240925-cfi-norm-kasan-fix-v1-1-0328985cdf33@google.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-09-26rust: KASAN+RETHUNK requires rustc 1.83.0Alice Ryhl
When enabling both KASAN and RETHUNK, objtool emits the following warnings: rust/core.o: warning: objtool: asan.module_ctor+0x13: 'naked' return found in MITIGATION_RETHUNK build rust/core.o: warning: objtool: asan.module_dtor+0x13: 'naked' return found in MITIGATION_RETHUNK build This is caused by the -Zfunction-return=thunk-extern flag in rustc not informing LLVM about the mitigation at the module level (it does so at the function level only currently, which covers most cases, but both are required), which means that the KASAN functions asan.module_ctor and asan.module_dtor are generated without the rethunk mitigation. The other mitigations that we enabled for Rust (SLS, RETPOLINE) do not have the same bug, as they're being applied through the target-feature functionality instead. This is being fixed for rustc 1.83.0, so update Kconfig to reject this configuration on older compilers. Link: https://github.com/rust-lang/rust/pull/130824 Fixes: d7868550d573 ("x86/rust: support MITIGATION_RETHUNK") Reported-by: Miguel Ojeda <ojeda@kernel.org> Closes: https://lore.kernel.org/all/CANiq72myZL4_poCMuNFevtpYYc0V0embjSuKb7y=C+m3vVA_8g@mail.gmail.com/ Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240926093849.1192264-1-aliceryhl@google.com [ Reworded to add the details mentioned in the list. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-09-26rust: cfi: fix `patchable-function-entry` starting versionMiguel Ojeda
The `-Zpatchable-function-entry` flag is available since Rust 1.81.0, not Rust 1.80.0, i.e. commit ac7595fdb1ee ("Support for -Z patchable-function-entry") in upstream Rust. Fixes: ca627e636551 ("rust: cfi: add support for CFI_CLANG with Rust") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Fiona Behrens <me@kloenk.dev> Link: https://lore.kernel.org/r/20240925141944.277936-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-09-25Merge tag 'rust-6.12' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull Rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Support 'MITIGATION_{RETHUNK,RETPOLINE,SLS}' (which cleans up objtool warnings), teach objtool about 'noreturn' Rust symbols and mimic '___ADDRESSABLE()' for 'module_{init,exit}'. With that, we should be objtool-warning-free, so enable it to run for all Rust object files. - KASAN (no 'SW_TAGS'), KCFI and shadow call sanitizer support. - Support 'RUSTC_VERSION', including re-config and re-build on change. - Split helpers file into several files in a folder, to avoid conflicts in it. Eventually those files will be moved to the right places with the new build system. In addition, remove the need to manually export the symbols defined there, reusing existing machinery for that. - Relax restriction on configurations with Rust + GCC plugins to just the RANDSTRUCT plugin. 'kernel' crate: - New 'list' module: doubly-linked linked list for use with reference counted values, which is heavily used by the upcoming Rust Binder. This includes 'ListArc' (a wrapper around 'Arc' that is guaranteed unique for the given ID), 'AtomicTracker' (tracks whether a 'ListArc' exists using an atomic), 'ListLinks' (the prev/next pointers for an item in a linked list), 'List' (the linked list itself), 'Iter' (an iterator over a 'List'), 'Cursor' (a cursor into a 'List' that allows to remove elements), 'ListArcField' (a field exclusively owned by a 'ListArc'), as well as support for heterogeneous lists. - New 'rbtree' module: red-black tree abstractions used by the upcoming Rust Binder. This includes 'RBTree' (the red-black tree itself), 'RBTreeNode' (a node), 'RBTreeNodeReservation' (a memory reservation for a node), 'Iter' and 'IterMut' (immutable and mutable iterators), 'Cursor' (bidirectional cursor that allows to remove elements), as well as an entry API similar to the Rust standard library one. - 'init' module: add 'write_[pin_]init' methods and the 'InPlaceWrite' trait. Add the 'assert_pinned!' macro. - 'sync' module: implement the 'InPlaceInit' trait for 'Arc' by introducing an associated type in the trait. - 'alloc' module: add 'drop_contents' method to 'BoxExt'. - 'types' module: implement the 'ForeignOwnable' trait for 'Pin<Box<T>>' and improve the trait's documentation. In addition, add the 'into_raw' method to the 'ARef' type. - 'error' module: in preparation for the upcoming Rust support for 32-bit architectures, like arm, locally allow Clippy lint for those. Documentation: - https://rust.docs.kernel.org has been announced, so link to it. - Enable rustdoc's "jump to definition" feature, making its output a bit closer to the experience in a cross-referencer. - Debian Testing now also provides recent Rust releases (outside of the freeze period), so add it to the list. MAINTAINERS: - Trevor is joining as reviewer of the "RUST" entry. And a few other small bits" * tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux: (54 commits) kasan: rust: Add KASAN smoke test via UAF kbuild: rust: Enable KASAN support rust: kasan: Rust does not support KHWASAN kbuild: rust: Define probing macros for rustc kasan: simplify and clarify Makefile rust: cfi: add support for CFI_CLANG with Rust cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS rust: support for shadow call stack sanitizer docs: rust: include other expressions in conditional compilation section kbuild: rust: replace proc macros dependency on `core.o` with the version text kbuild: rust: rebuild if the version text changes kbuild: rust: re-run Kconfig if the version text changes kbuild: rust: add `CONFIG_RUSTC_VERSION` rust: avoid `box_uninit_write` feature MAINTAINERS: add Trevor Gross as Rust reviewer rust: rbtree: add `RBTree::entry` rust: rbtree: add cursor rust: rbtree: add mutable iterator rust: rbtree: add iterator rust: rbtree: add red-black tree implementation backed by the C version ...