summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-19Merge tag 'trace-v6.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Initialize hash variables in ftrace subops logic The fix that simplified the ftrace subops logic opened a path where some variables could be used without being initialized, and done subtly where the compiler did not catch it. Initialize those variables to the EMPTY_HASH, which is the default hash. - Reinitialize the hash pointers after they are freed Some of the hash pointers in the subop logic were freed but may still be referenced later. To prevent use-after-free bugs, initialize them back to the EMPTY_HASH. - Free the ftrace hashes when they are replaced The fix that simplified the subops logic updated some hash pointers, but left the original hash that they were pointing to where they are no longer used. This caused a memory leak. Free the hashes that are pointed to by the pointers when they are replaced. - Fix size initialization of ftrace direct function hash The ftrace direct function hash used by BPF initialized the hash size incorrectly. It checked the size of items to a hard coded 32, which made the hash bit size of 5. The hash size is supposed to be limited by the bit size of the hash, as the bitmask is allowed to be greater than 5. Rework the size check to first pass the number of elements to fls() and then compare that to FTRACE_HASH_MAX_BITS before allocating the hash. - Fix format output of ftrace_graph_ent_entry event The field depth of the ftrace_graph_ent_entry event is of size 4 but the output showed it as unsigned long and use "%lu". Change it to unsigned int and use "%u" in the print format that is displayed to user space. - Fix the trace event filter on strings Events can be filtered on numbers or string values. The return value checked from strncpy_from_kernel_nofault() and strncpy_from_user_nofault() was used to determine if reading the strings would fault or not. It would return fault if the value was non zero, which is basically meant that it was always considering the read as a fault. - Add selftest to test trace event string filtering In order to catch the breakage of the string filtering, add a self test to make sure that it continues to work. * tag 'trace-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: selftests: Add testing a user string to filters tracing: Fix filter string testing ftrace: Fix type of ftrace_graph_ent_entry.depth ftrace: fix incorrect hash size in register_ftrace_direct() ftrace: Free ftrace hashes after they are replaced in the subops code ftrace: Reinitialize hash to EMPTY_HASH after freeing ftrace: Initialize variables for ftrace_startup/shutdown_subops()
2025-04-19Merge tag 'nfsd-6.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - v6.15 libcrc clean-up makes invalid configurations possible - Fix a potential deadlock introduced during the v6.15 merge window * tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: decrease sc_count directly if fail to queue dl_recall nfs: add missing selections of CONFIG_CRC32
2025-04-19Merge tag 'rust-fixes-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rust fixes from Miguel Ojeda: "Toolchain and infrastructure: - Fix missing KASAN LLVM flags on first build (and fix spurious rebuilds) by skipping '--target' - Fix Make < 4.3 build error by using '$(pound)' - Fix UML build error by removing 'volatile' qualifier from io helpers - Fix UML build error by adding 'dma_{alloc,free}_attrs()' helpers - Clean gendwarfksyms warnings by avoiding to export '__pfx' symbols - Clean objtool warning by adding a new 'noreturn' function for 1.86.0 - Disable 'needless_continue' Clippy lint due to new 1.86.0 warnings - Add missing 'ffi' crate to 'generate_rust_analyzer.py' 'pin-init' crate: - Import a couple fixes from upstream" * tag 'rust-fixes-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: rust: helpers: Add dma_alloc_attrs() and dma_free_attrs() rust: helpers: Remove volatile qualifier from io helpers rust: kbuild: use `pound` to support GNU Make < 4.3 objtool/rust: add one more `noreturn` Rust function for Rust 1.86.0 rust: kasan/kbuild: fix missing flags on first build rust: disable `clippy::needless_continue` rust: kbuild: Don't export __pfx symbols rust: pin-init: use Markdown autolinks in Rust comments rust: pin-init: alloc: restrict `impl ZeroableOption` for `Box` to `T: Sized` scripts: generate_rust_analyzer: Add ffi crate
2025-04-19Merge tag 'drm-fixes-2025-04-19' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Easter rc3 pull request, fixes in all the usuals, amdgpu, xe, msm, with some i915/ivpu/mgag200/v3d fixes, then a couple of bits in dma-buf/gem. Hopefully has no easter eggs in it. dma-buf: - Correctly decrement refcounter on errors gem: - Fix test for imported buffers amdgpu: - Cleaner shader sysfs fix - Suspend fix - Fix doorbell free ordering - Video caps fix - DML2 memory allocation optimization - HDP fix i915: - Fix DP DSC configurations that require 3 DSC engines per pipe xe: - Fix LRC address being written too late for GuC - Fix notifier vs folio deadlock - Fix race betwen dma_buf unmap and vram eviction - Fix debugfs handling PXP terminations unconditionally msm: - Display: - Fix to call dpu_plane_atomic_check_pipe() for both SSPPs in case of multi-rect - Fix to validate plane_state pointer before using it in dpu_plane_virtual_atomic_check() - Fix to make sure dereferencing dpu_encoder_phys happens after making sure it is valid in _dpu_encoder_trigger_start() - Remove the remaining intr_tear_rd_ptr which we initialized to -1 because NO_IRQ indices start from 0 now - GPU: - Fix IB_SIZE overflow ivpu: - Fix debugging - Fixes to frequency - Support firmware API 3.28.3 - Flush jobs upon reset mgag200: - Set vblank start to correct values v3d: - Fix Indirect Dispatch" * tag 'drm-fixes-2025-04-19' of https://gitlab.freedesktop.org/drm/kernel: (26 commits) drm/msm/a6xx+: Don't let IB_SIZE overflow drm/xe/pxp: do not queue unneeded terminations from debugfs drm/xe/dma_buf: stop relying on placement in unmap drm/xe/userptr: fix notifier vs folio deadlock drm/xe: Set LRC addresses before guc load drm/mgag200: Fix value in <VBLKSTR> register drm/gem: Internally test import_attach for imported objects drm/amdgpu: Use the right function for hdp flush drm/amd/display/dml2: use vzalloc rather than kzalloc drm/amdgpu: Add back JPEG to video caps for carrizo and newer drm/amdgpu: fix warning of drm_mm_clean drm/amd: Forbid suspending into non-default suspend states drm/amdgpu: use a dummy owner for sysfs triggered cleaner shaders v4 drm/i915/dp: Check for HAS_DSC_3ENGINES while configuring DSC slices drm/i915/display: Add macro for checking 3 DSC engines dma-buf/sw_sync: Decrement refcount on error in sw_sync_ioctl_get_deadline() accel/ivpu: Add cmdq_id to job related logs accel/ivpu: Show NPU frequency in sysfs accel/ivpu: Fix the NPU's DPU frequency calculation accel/ivpu: Update FW Boot API to version 3.28.3 ...
2025-04-19Merge tag 'drm-msm-fixes-2025-04-18' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-fixes Fixes for v6.15-rc3 Display: - Fix to call dpu_plane_atomic_check_pipe() for both SSPPs in case of multi-rect - Fix to validate plane_state pointer before using it in dpu_plane_virtual_atomic_check() - Fix to make sure dereferencing dpu_encoder_phys happens after making sure it is valid in _dpu_encoder_trigger_start() - Remove the remaining intr_tear_rd_ptr which we initialized to -1 because NO_IRQ indices start from 0 now GPU: - Fix IB_SIZE overflow Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://lore.kernel.org/r/CAF6AEGtVKXEVdzUzFWmQE8JmK3nx_hp+ynOd-5j3vnfcU-sgOA@mail.gmail.com
2025-04-19Merge tag 'drm-xe-fixes-2025-04-18' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix LRC address being written too late for GuC - Fix notifier vs folio deadlock - Fix race betwen dma_buf unmap and vram eviction - Fix debugfs handling PXP terminations unconditionally Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/ndinq644zenywaaycxyfqqivsb2xer4z7err3dlpalbz33jfkm@ttabzsg6wnet
2025-04-18Merge tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Fix hard link lease key problem when close is deferred - Revert the socket lockdep/refcount workarounds done in cifs.ko now that it is fixed at the socket layer * tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: Revert "smb: client: fix TCP timers deadlock after rmmod" Revert "smb: client: Fix netns refcount imbalance causing leaks and use-after-free" smb3 client: fix open hardlink on deferred close file error
2025-04-18drm/msm/a6xx+: Don't let IB_SIZE overflowRob Clark
IB_SIZE is only b0..b19. Starting with a6xx gen3, additional fields were added above the IB_SIZE. Accidentially setting them can cause badness. Fix this by properly defining the CP_INDIRECT_BUFFER packet and using the generated builder macro to ensure unintended bits are not set. v2: add missing type attribute for IB_BASE v3: fix offset attribute in xml Reported-by: Connor Abbott <cwabbott0@gmail.com> Fixes: a83366ef19ea ("drm/msm/a6xx: add A640/A650 to gpulist") Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/643396/
2025-04-18Merge tag 'i2c-host-fixes-6.15-rc3' of ↵Wolfram Sang
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current i2c-host-fixes for v6.15-rc3 - ChromeOS EC tunnel: fix potential NULL pointer dereference
2025-04-18Merge tag 'x86-urgent-2025-04-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix hypercall detection on Xen guests - Extend the AMD microcode loader SHA check to Zen5, to block loading of any unreleased standalone Zen5 microcode patches - Add new Intel CPU model number for Bartlett Lake - Fix the workaround for AMD erratum 1054 - Fix buggy early memory acceptance between SEV-SNP guests and the EFI stub * tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/sev: Avoid shared GHCB page for early memory acceptance x86/cpu/amd: Fix workaround for erratum 1054 x86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove cores x86/microcode/AMD: Extend the SHA check to Zen5, block loading of any unreleased standalone Zen5 microcode patches x86/xen: Fix __xen_hypercall_setfunc()
2025-04-18Merge tag 'timers-urgent-2025-04-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a lockdep false positive in the i8253 driver" * tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/i8253: Call clockevent_i8253_disable() with interrupts disabled
2025-04-18Merge tag 'perf-urgent-2025-04-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event fixes from Ingo Molnar: "Miscellaneous fixes and a hardware-enabling change: - Fix Intel uncore PMU IIO free running counters on SPR, ICX and SNR systems - Fix Intel PEBS buffer overflow handling - Fix skid in Intel PEBS sampling of user-space general purpose registers - Enable Panther Lake PMU support - similar to Lunar Lake" * tag 'perf-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Add Panther Lake support perf/x86/intel: Allow to update user space GPRs from PEBS records perf/x86/intel: Don't clear perf metrics overflow bit unconditionally perf/x86/intel/uncore: Fix the scale of IIO free running counters on SPR perf/x86/intel/uncore: Fix the scale of IIO free running counters on ICX perf/x86/intel/uncore: Fix the scale of IIO free running counters on SNR
2025-04-18Merge tag 'irq-urgent-2025-04-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc irq fixes from Ingo Molnar: - Fix BCM2712 irqchip driver Kconfig dependencies required on the Raspberry PI5 - Fix spurious interrupts on RZ/G3E SMARC EVK systems - Fix crash regression on Sun/NIU hardware - Apply MSI driver quirk for Sun Neptune chips * tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-bcm2712-mip: Enable driver when ARCH_BCM2835 is enabled irqchip/renesas-rzv2h: Prevent TINT spurious interrupt net/niu: Niu requires MSIX ENTRY_DATA fields touch before entry reads PCI/MSI: Add an option to write MSIX ENTRY_DATA before any reads
2025-04-18Merge tag 'core-urgent-2025-04-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc core fixes from Ingo Molnar: "Fix a genksyms related bug, triggered by recent changes to the percpu code, and update the .clang-format file to not include obsolete function names" * tag 'core-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genksyms: Handle typeof_unqual keyword and __seg_{fs,gs} qualifiers clang-format: Update the ForEachMacros list for v6.15-rc1
2025-04-18Merge tag 'hardening-v6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - lib/prime_numbers: KUnit test should not select PRIME_NUMBERS (Geert Uytterhoeven) - ubsan: Fix panic from test_ubsan_out_of_bounds (Mostafa Saleh) - ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP (Nathan Chancellor) - string: Add load_unaligned_zeropad() code path to sized_strscpy() (Peter Collingbourne) - kasan: Add strscpy() test to trigger tag fault on arm64 (Vincenzo Frascino) - Disable GCC randstruct for COMPILE_TEST * tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lib/prime_numbers: KUnit test should not select PRIME_NUMBERS ubsan: Fix panic from test_ubsan_out_of_bounds lib/Kconfig.ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP hardening: Disable GCC randstruct for COMPILE_TEST kasan: Add strscpy() test to trigger tag fault on arm64 string: Add load_unaligned_zeropad() code path to sized_strscpy()
2025-04-18Merge tag 'gpio-fixes-for-v6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fix from Bartosz Golaszewski: - check for both the new AND old (deprecated) setter callback when changing GPIO direction to output * tag 'gpio-fixes-for-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: Allow to use setters with return value for output-only gpios
2025-04-18Merge tag 'thermal-6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "Add missing DVFS support flags for the Lunar Lake and Panther Lake platforms to the int340x Intel thermal driver and fix DLVR support for Panther Lake in it (Srinivas Pandruvada)" * tag 'thermal-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel: int340x: Fix Panther Lake DLVR support thermal: intel: int340x: Add missing DVFS support flags
2025-04-18Merge tag 'pm-6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These are mostly cpufreq fixes, some of which address recent regressions and some address older issues that have come to light during the last two weeks, and a runtime PM documentation correction: - Fix the performance-to-frequency scaling factor computation on systems using HWP in the intel_pstate driver after a recent incorrect update of it (Rafael Wysocki) - Fix the usage of the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag in the schedutil cpufreq governor after a recent update of it that has caused frequency limits changes to be missed sometimes (Rafael Wysocki) - Address some recently discovered synchronization issues related to frequency limits changes in the schedutil cpufreq governor and in the cpufreq core (Rafael Wysocki) - Fix ITMT support in the amd-pstate cpufreq driver so that it is enabled after asym priorities have been correctly initialized for all CPUs (K Prateek Nayak) - Fix changing min/max limits in the amd-pstate cpufreq driver while on the performance governor (Dhananjay Ugwekar) - Fix a function name in the runtime PM documentation that was previously incorrectly updated by mistake (Sakari Ailus)" * tag 'pm-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: Avoid using inconsistent policy->min and policy->max cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit() cpufreq/sched: Explicitly synchronize limits_changed flag handling cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS Documentation: PM: runtime: Fix a reference to pm_runtime_autosuspend() cpufreq: intel_pstate: Fix hwp_get_cpu_scaling() cpufreq/amd-pstate: Enable ITMT support after initializing core rankings cpufreq/amd-pstate: Fix min_limit perf and freq updation for performance governor
2025-04-18Merge branch 'pm-docs'Rafael J. Wysocki
Merge a runtime PM documentation correction for 6.15-rc3. * pm-docs: Documentation: PM: runtime: Fix a reference to pm_runtime_autosuspend()
2025-04-18Merge tag 'riscv-for-linus-6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix for an issue where C instructions ended up in non-C builds, due to some broken inline assembly in the KGDB breakpoint insertion code - A fix to avoid spurious printk messages about misaligned access performance probing - A fix for a handful of issues with /proc/iomem's reserved region handling - A pair of fixes for module relocation processing - A few build-time fixes * tag 'riscv-for-linus-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: KGDB: Remove ".option norvc/.option rvc" for kgdb_compiled_break riscv: KGDB: Do not inline arch_kgdb_breakpoint() riscv: Avoid fortify warning in syscall_get_arguments() riscv: Provide all alternative macros all the time riscv: module: Allocate PLT entries for R_RISCV_PLT32 riscv: module: Fix out-of-bounds relocation access riscv: Properly export reserved regions in /proc/iomem riscv: Fix unaligned access info messages riscv: Avoid fortify warning in syscall_get_arguments() Documentation: riscv: Fix typo MIMPLID -> MIMPID riscv: Use kvmalloc_array on relocation_hashtable
2025-04-18Merge tag 'linux_kselftest-kunit-fixes-6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit fix from Shuah Khan: "Fixes arch sh kunit qemu_configs script sh.py to honor kunit cmdline" * tag 'linux_kselftest-kunit-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: qemu_configs: SH: Respect kunit cmdline
2025-04-18Merge tag 'linux_kselftest-fixes-6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fix from Shuah Khan: "Fixes dynevent_limitations.tc test failure on dash by detecting and handling bash and dash differences in evaluating \\" * tag 'linux_kselftest-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/ftrace: Differentiate bash and dash in dynevent_limitations.tc
2025-04-18Merge tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Fix integer overflow in server disconnect deadtime calculation - Three fixes for potential use after frees: one for oplocks, and one for leases and one for kerberos authentication - Fix to prevent attempted write to directory - Fix locking warning for durable scavenger thread * tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: Prevent integer overflow in calculation of deadtime ksmbd: fix the warning from __kernel_write_iter ksmbd: fix use-after-free in smb_break_all_levII_oplock() ksmbd: fix use-after-free in __smb2_lease_break_noti() ksmbd: fix WARNING "do not call blocking ops when !TASK_RUNNING" ksmbd: Fix dangling pointer in krb_authenticate
2025-04-18Merge tag 'block-6.15-20250417' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - MD pull via Yu: - fix raid10 missing discard IO accounting (Yu Kuai) - fix bitmap stats for bitmap file (Zheng Qixing) - fix oops while reading all member disks failed during check/repair (Meir Elisha) - NVMe pull via Christoph: - fix scan failure for non-ANA multipath controllers (Hannes Reinecke) - fix multipath sysfs links creation for some cases (Hannes Reinecke) - PCIe endpoint fixes (Damien Le Moal) - use NULL instead of 0 in the auth code (Damien Le Moal) - Various ublk fixes: - Slew of selftest additions - Improvements and fixes for IO cancelation - Tweak to Kconfig verbiage - Fix for page dirtying for blk integrity mapped pages - loop fixes: - buffered IO fix - uevent fixes - request priority inheritance fix - Various little fixes * tag 'block-6.15-20250417' of git://git.kernel.dk/linux: (38 commits) selftests: ublk: add generic_06 for covering fault inject ublk: simplify aborting ublk request ublk: remove __ublk_quiesce_dev() ublk: improve detection and handling of ublk server exit ublk: move device reset into ublk_ch_release() ublk: rely on ->canceling for dealing with ublk_nosrv_dev_should_queue_io ublk: add ublk_force_abort_dev() ublk: properly serialize all FETCH_REQs selftests: ublk: move creating UBLK_TMP into _prep_test() selftests: ublk: add test_stress_05.sh selftests: ublk: support user recovery selftests: ublk: support target specific command line selftests: ublk: increase max nr_queues and queue depth selftests: ublk: set queue pthread's cpu affinity selftests: ublk: setup ring with IORING_SETUP_SINGLE_ISSUER/IORING_SETUP_DEFER_TASKRUN selftests: ublk: add two stress tests for zero copy feature selftests: ublk: run stress tests in parallel selftests: ublk: make sure _add_ublk_dev can return in sub-shell selftests: ublk: cleanup backfile automatically selftests: ublk: add io_uring uapi header ...
2025-04-18Merge tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Correctly cap iov_iter->nr_segs for imports of registered buffers, both kbuf and normal ones. Three cleanups to make it saner first, then two fixes for each of the buffer types. This fixes a performance regression where partial buffer usage doesn't trim the tail number of segments, leading the block layer to iterate the IOs to check if it needs splitting. - Two patches tweaking the newly introduced zero-copy rx API, mostly to keep the API consistent once we add multiple interface queues per ring support in the 6.16 release. - zc rx unmapping fix for a dead device * tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linux: io_uring/zcrx: fix late dma unmap for a dead dev io_uring/rsrc: ensure segments counts are correct on kbuf buffers io_uring/rsrc: send exact nr_segs for fixed buffer io_uring/rsrc: refactor io_import_fixed io_uring/rsrc: separate kbuf offset adjustments io_uring/rsrc: don't skip offset calculation io_uring/zcrx: add pp to ifq conversion helper io_uring/zcrx: return ifq id to the user
2025-04-18soc: ti: knav_qmss_queue: Remove unnecessary NULL check before free_percpu()Chen Ni
free_percpu() checks for NULL pointers internally. Remove unneeded NULL check here. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20250417084836.937452-1-nichen@iscas.ac.cn Signed-off-by: Nishanth Menon <nm@ti.com>
2025-04-18soc: ti: k3-ringacc: Use device_match_of_node()Tang Dongxing
Replace the open-code with device_match_of_node(). Signed-off-by: Tang Dongxing <tang.dongxing@zte.com.cn> Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn> Link: https://lore.kernel.org/r/20250331201425296l4h98bZjxHzs08fdvHrGO@zte.com.cn Signed-off-by: Nishanth Menon <nm@ti.com>
2025-04-18tracing: selftests: Add testing a user string to filtersSteven Rostedt
Running the following commands was broken: # cd /sys/kernel/tracing # echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter # echo 1 > events/syscalls/sys_enter_openat/enable # ls /proc/$$/maps # cat trace And would produce nothing when it should have produced something like: ls-1192 [007] ..... 8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0) Add a test to check this case so that it will be caught if it breaks again. Link: https://lore.kernel.org/linux-trace-kernel/20250417183003.505835fb@gandalf.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/20250418101208.38dc81f5@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-18x86/boot/sev: Avoid shared GHCB page for early memory acceptanceArd Biesheuvel
Communicating with the hypervisor using the shared GHCB page requires clearing the C bit in the mapping of that page. When executing in the context of the EFI boot services, the page tables are owned by the firmware, and this manipulation is not possible. So switch to a different API for accepting memory in SEV-SNP guests, one which is actually supported at the point during boot where the EFI stub may need to accept memory, but the SEV-SNP init code has not executed yet. For simplicity, also switch the memory acceptance carried out by the decompressor when not booting via EFI - this only involves the allocation for the decompressed kernel, and is generally only called after kexec, as normal boot will jump straight into the kernel from the EFI stub. Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250404082921.2767593-8-ardb+git@google.com # discussion thread #1 Link: https://lore.kernel.org/r/20250410132850.3708703-2-ardb+git@google.com # discussion thread #2 Link: https://lore.kernel.org/r/20250417202120.1002102-2-ardb+git@google.com # final submission
2025-04-18x86/cpu/amd: Fix workaround for erratum 1054Sandipan Das
Erratum 1054 affects AMD Zen processors that are a part of Family 17h Models 00-2Fh and the workaround is to not set HWCR[IRPerfEn]. However, when X86_FEATURE_ZEN1 was introduced, the condition to detect unaffected processors was incorrectly changed in a way that the IRPerfEn bit gets set only for unaffected Zen 1 processors. Ensure that HWCR[IRPerfEn] is set for all unaffected processors. This includes a subset of Zen 1 (Family 17h Models 30h and above) and all later processors. Also clear X86_FEATURE_IRPERF on affected processors so that the IRPerfCount register is not used by other entities like the MSR PMU driver. Fixes: 232afb557835 ("x86/CPU/AMD: Add X86_FEATURE_ZEN1") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/caa057a9d6f8ad579e2f1abaa71efbd5bd4eaf6d.1744956467.git.sandipan.das@amd.com
2025-04-18io_uring/zcrx: fix late dma unmap for a dead devPavel Begunkov
There is a problem with page pools not dma-unmapping immediately when the device is going down, and delaying it until the page pool is destroyed, which is not allowed (see links). That just got fixed for normal page pools, and we need to address memory providers as well. Unmap pages in the memory provider uninstall callback, and protect it with a new lock. There is also a gap between when a dma mapping is created and the mp is installed, so if the device is killed in between, io_uring would be holding on to dma mappings to a dead device with no one to call ->uninstall. Move it to page pool init and rely on ->is_mapped to make sure it's only done once. Link: https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/ Link: https://lore.kernel.org/all/20250409-page-pool-track-dma-v9-0-6a9ef2e0cba8@redhat.com/ Fixes: 34a3e60821ab9 ("io_uring/zcrx: implement zerocopy receive pp memory provider") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ef9b7db249b14f6e0b570a1bb77ff177389f881c.1744965853.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-17MAINTAINERS: add section for locking of mm's and VMAsLorenzo Stoakes
We place this under memory mapping as related to memory mapping abstractions in the form of mm_struct and vm_area_struct (VMA). Now we have separated out mmap/vma locking logic into the mmap_lock.c and mmap_lock.h files, so this should encapsulate the majority of the mm locking logic in the kernel. Suren is best placed to maintain this logic as the core architect of VMA locking as a whole. Link: https://lkml.kernel.org/r/e6ed679a184ca444b20dfa77af96913fd8b5efa0.1744799282.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm: vmscan: fix kswapd exit condition in defrag_modeJohannes Weiner
Vlastimil points out an issue with kswapd in defrag_mode not waking up kcompactd reliably. Background: When kswapd is woken for any higher-order request, it initially checks those high-order watermarks to decide if work is necesary. However, it cannot (efficiently) meet the contiguity goal of such a request by itself. So once it has reclaimed a compaction gap, it adjusts the request down to check for free order-0 pages, then wakes kcompactd to coalesce them into larger blocks. In defrag_mode, the initial watermark check needs to be analogously against free pageblocks. However, once kswapd drops the high-order to hand off contiguity work, it also needs to fall back to base page watermarks - otherwise it'll keep reclaiming until blocks are freed. While it appears kcompactd is woken up frequently enough to do most of the compaction work, kswapd ends up overreclaiming by quite a bit: DEFRAGMODE DEFRAGMODE-thispatch Hugealloc Time mean 79381.34 ( +0.00%) 88126.12 ( +11.02%) Hugealloc Time stddev 85852.16 ( +0.00%) 135366.75 ( +57.67%) Kbuild Real time 249.35 ( +0.00%) 226.71 ( -9.04%) Kbuild User time 1249.16 ( +0.00%) 1249.37 ( +0.02%) Kbuild System time 171.76 ( +0.00%) 166.93 ( -2.79%) THP fault alloc 51666.87 ( +0.00%) 52685.60 ( +1.97%) THP fault fallback 16970.00 ( +0.00%) 15951.87 ( -6.00%) Direct compact fail 166.53 ( +0.00%) 178.93 ( +7.40%) Direct compact success 17.13 ( +0.00%) 4.13 ( -71.69%) Compact daemon scanned migrate 3095413.33 ( +0.00%) 9231239.53 ( +198.22%) Compact daemon scanned free 2155966.53 ( +0.00%) 7053692.87 ( +227.17%) Compact direct scanned migrate 265642.47 ( +0.00%) 68388.33 ( -74.26%) Compact direct scanned free 130252.60 ( +0.00%) 55634.87 ( -57.29%) Compact total migrate scanned 3361055.80 ( +0.00%) 9299627.87 ( +176.69%) Compact total free scanned 2286219.13 ( +0.00%) 7109327.73 ( +210.96%) Alloc stall 1890.80 ( +0.00%) 6297.60 ( +232.94%) Pages kswapd scanned 9043558.80 ( +0.00%) 5952576.73 ( -34.18%) Pages kswapd reclaimed 1891708.67 ( +0.00%) 1030645.00 ( -45.52%) Pages direct scanned 1017090.60 ( +0.00%) 2688047.60 ( +164.29%) Pages direct reclaimed 92682.60 ( +0.00%) 309770.53 ( +234.22%) Pages total scanned 10060649.40 ( +0.00%) 8640624.33 ( -14.11%) Pages total reclaimed 1984391.27 ( +0.00%) 1340415.53 ( -32.45%) Swap out 884585.73 ( +0.00%) 417781.93 ( -52.77%) Swap in 287106.27 ( +0.00%) 95589.73 ( -66.71%) File refaults 551697.60 ( +0.00%) 426474.80 ( -22.70%) Link: https://lkml.kernel.org/r/20250416135142.778933-3-hannes@cmpxchg.org Fixes: a211c6550efc ("mm: page_alloc: defrag_mode kswapd/kcompactd watermarks") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm: vmscan: restore high-cpu watermark safety in kswapdJohannes Weiner
Vlastimil points out that commit a211c6550efc ("mm: page_alloc: defrag_mode kswapd/kcompactd watermarks") switched kswapd from zone_watermark_ok_safe() to the standard, percpu-cached version of reading free pages, thus dropping the watermark safety precautions for systems with high CPU counts (e.g. >212 cpus on 64G). Restore them. Since zone_watermark_ok_safe() is no longer the right interface, and this was the last caller of the function anyway, open-code the zone_page_state_snapshot() conditional and delete the function. Link: https://lkml.kernel.org/r/20250416135142.778933-2-hannes@cmpxchg.org Fixes: a211c6550efc ("mm: page_alloc: defrag_mode kswapd/kcompactd watermarks") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING sectionLorenzo Stoakes
Pedro has offered to review memory mapping code. He has good experience in this area and has provided excellent feedback on memory mapping series in the past so I feel he'll be a great addition. Link: https://lkml.kernel.org/r/20250416135301.43513-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount ↵David Hildenbrand
stabilization In __folio_remove_rmap() for RMAP_LEVEL_PMD/RMAP_LEVEL_PUD and with CONFIG_PAGE_MAPCOUNT we first decrement the folio mapcount (and recompute mapped shared vs. mapped exclusively) to then adjust the entire mapcount. This means that another process might stumble in do_wp_page() over a PTE-mapped PMD folio that is indicated as "exclusively mapped", but still has an entire mapcount (PMD mapping), because it is racing with the process that is unmapping the folio (PMD mapping). Note that do_wp_page() will back off once it detects the remaining folio reference from the process that is in the process of unmapping the folio. This will trigger the early VM_WARN_ON_ONCE(folio_entire_mapcount(folio)) check in do_wp_page(), that can easily be reproduced by looping a couple of times over allocating a PMD THP, forking a child where we immediately unmap it again, and writing in the parent concurrently to the THP. [ 252.738129][T16470] ------------[ cut here ]------------ [ 252.739267][T16470] WARNING: CPU: 3 PID: 16470 at mm/memory.c:3738 do_wp_page+0x2a75/0x2c00 [ 252.740968][T16470] Modules linked in: [ 252.741958][T16470] CPU: 3 UID: 0 PID: 16470 Comm: ... ... [ 252.765841][T16470] <TASK> [ 252.766419][T16470] ? srso_alias_return_thunk+0x5/0xfbef5 [ 252.767558][T16470] ? rcu_is_watching+0x12/0x60 [ 252.768525][T16470] ? srso_alias_return_thunk+0x5/0xfbef5 [ 252.769645][T16470] ? srso_alias_return_thunk+0x5/0xfbef5 [ 252.770778][T16470] ? lock_acquire+0x33/0x80 [ 252.771697][T16470] ? __handle_mm_fault+0x5e8/0x3e40 [ 252.772735][T16470] ? __handle_mm_fault+0x5e8/0x3e40 [ 252.773781][T16470] __handle_mm_fault+0x1869/0x3e40 [ 252.774839][T16470] handle_mm_fault+0x22a/0x640 [ 252.775808][T16470] do_user_addr_fault+0x618/0x1000 [ 252.776847][T16470] exc_page_fault+0x68/0xd0 [ 252.777775][T16470] asm_exc_page_fault+0x26/0x30 While we could adjust the sequence in __folio_remove_rmap(), let's rater move the mapcount sanity checks after the mapcount vs. refcount stabilization phase. With this fix, a simple reproducer is happy. While at it, convert the two VM_WARN_ON_ONCE() we are moving to VM_WARN_ON_ONCE_FOLIO(). Link: https://lkml.kernel.org/r/20250415095007.569836-1-david@redhat.com Fixes: 1da190f4d0a6 ("mm: Copy-on-Write (COW) reuse support for PTE-mapped THP") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: syzbot+5e8feb543ca8e12e0ede@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/67fab4fe.050a0220.2c5fcf.0011.GAE@google.com Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm, hugetlb: increment the number of pages to be reset on HVOOscar Salvador
commit 4eeec8c89a0c ("mm: move hugetlb specific things in folio to page[3]") shifted hugetlb specific stuff, and now mapping overlaps _hugetlb_cgroup field. Upon restoring the vmemmap for HVO, only the first two tail pages are reset, and this causes the check in free_tail_page_prepare() to fail as it finds an unexpected mapping value in some tails. Increment the number of pages to be reset to 4 (head + 3 tail pages) Link: https://lkml.kernel.org/r/20250415111859.376302-1-osalvador@suse.de Fixes: 4eeec8c89a0c ("mm: move hugetlb specific things in folio to page[3]") Signed-off-by: Oscar Salvador <osalvador@suse.de> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17writeback: fix false warning in inode_to_wb()Andreas Gruenbacher
inode_to_wb() is used also for filesystems that don't support cgroup writeback. For these filesystems inode->i_wb is stable during the lifetime of the inode (it points to bdi->wb) and there's no need to hold locks protecting the inode->i_wb dereference. Improve the warning in inode_to_wb() to not trigger for these filesystems. Link: https://lkml.kernel.org/r/20250412163914.3773459-3-agruenba@redhat.com Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17docs: ABI: replace mcroce@microsoft.com with new Meta addressAhmad Fatoum
The Microsoft email address is bouncing: 550 5.4.1 Recipient address rejected: Access denied. So let's replace it with Matteo's current mail address. Link: https://lkml.kernel.org/r/20250414-fix-mcroce-mail-bounce-v3-1-0aed2d71f3d7@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Acked-by: Matteo Croce <teknoraver@meta.com> Link: https://lore.kernel.org/all/BYAPR15MB2504E4B02DFFB1E55871955DA1062@BYAPR15MB2504.namprd15.prod.outlook.com/ Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matteo Croce <teknoraver@meta.com> Cc: Sascha Hauer <kernel@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()Baoquan He
Not like fault_in_readable() or fault_in_writeable(), in fault_in_safe_writeable() local variable 'start' is increased page by page to loop till the whole address range is handled. However, it mistakenly calculates the size of the handled range with 'uaddr - start'. Fix it here. Andreas said: : In gfs2, fault_in_iov_iter_writeable() is used in : gfs2_file_direct_read() and gfs2_file_read_iter(), so this potentially : affects buffered as well as direct reads. This bug could cause those : gfs2 functions to spin in a loop. Link: https://lkml.kernel.org/r/20250410035717.473207-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20250410035717.473207-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Fixes: fe673d3f5bf1 ("mm: gup: make fault_in_safe_writeable() use fixup_user_fault()") Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Yanjun.Zhu <yanjun.zhu@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: add memory advice sectionLorenzo Stoakes
The madvise code straddles both VMA and page table manipulation. As a result, separate it out into its own section and add maintainers/reviewers as appropriate. We additionally include the mman-common.h file as this contains the shared madvise flags and it is important we maintain this alongside madvise.c. Link: https://lkml.kernel.org/r/20250411072724.10841-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Jann Horn <jannh@google.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: add mmap trace events to MEMORY MAPPINGLiam R. Howlett
MEMORY MAPPING does not list the mmap.h trace point file, but does list the mmap.c file. Couple the trace points with the users and authors of the trace points for notifications of updates. Link: https://lkml.kernel.org/r/20250411173328.8172-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm: memcontrol: fix swap counter leak from offline cgroupMuchun Song
commit 73f839b6d2ed addressed an issue regarding the swap counter leak that occurred from an offline cgroup. However, commit 89ce924f0bd4 modified the parameter from @swap_memcg to @memcg (presumably this alteration was introduced while resolving conflicts). Fix this problem by reverting this minor change. Link: https://lkml.kernel.org/r/20250410081812.10073-1-songmuchun@bytedance.com Fixes: 89ce924f0bd4 ("mm: memcontrol: move memsw charge callbacks to v1") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: add MM subsection for the page allocatorVlastimil Babka
Add a subsection for the page allocator, including compaction as it's crucial for high-order allocations and works together with the anti-fragmentation features. Add reviewers (including myself) who voluteered. Link: https://lkml.kernel.org/r/20250410090021.72296-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Brendan Jackman <jackmanb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Christoph Lameter (Ampere) <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: update SLAB ALLOCATOR maintainersVlastimil Babka
With permission, reduce the number of maintainers. Create a CREDITS entry for Joonsoo (Pekka already has one). Thanks for all the work! Link: https://lkml.kernel.org/r/20250410090021.72296-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Christoph Lameter (Ampere) <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17fs/dax: fix folio splitting issue by resetting old folio order + _nr_pagesDavid Hildenbrand
Alison reports an issue with fsdax when large extends end up using large ZONE_DEVICE folios: [ 417.796271] BUG: kernel NULL pointer dereference, address: 0000000000000b00 [ 417.796982] #PF: supervisor read access in kernel mode [ 417.797540] #PF: error_code(0x0000) - not-present page [ 417.798123] PGD 2a5c5067 P4D 2a5c5067 PUD 2a5c6067 PMD 0 [ 417.798690] Oops: Oops: 0000 [#1] SMP NOPTI [ 417.799178] CPU: 5 UID: 0 PID: 1515 Comm: mmap Tainted: ... [ 417.800150] Tainted: [O]=OOT_MODULE [ 417.800583] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 417.801358] RIP: 0010:__lruvec_stat_mod_folio+0x7e/0x250 [ 417.801948] Code: ... [ 417.803662] RSP: 0000:ffffc90002be3a08 EFLAGS: 00010206 [ 417.804234] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000002 [ 417.804984] RDX: ffffffff815652d7 RSI: 0000000000000000 RDI: ffffffff82a2beae [ 417.805689] RBP: ffffc90002be3a28 R08: 0000000000000000 R09: 0000000000000000 [ 417.806384] R10: ffffea0007000040 R11: ffff888376ffe000 R12: 0000000000000001 [ 417.807099] R13: 0000000000000012 R14: ffff88807fe4ab40 R15: ffff888029210580 [ 417.807801] FS: 00007f339fa7a740(0000) GS:ffff8881fa9b9000(0000) knlGS:0000000000000000 [ 417.808570] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 417.809193] CR2: 0000000000000b00 CR3: 000000002a4f0004 CR4: 0000000000370ef0 [ 417.809925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 417.810622] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 417.811353] Call Trace: [ 417.811709] <TASK> [ 417.812038] folio_add_file_rmap_ptes+0x143/0x230 [ 417.812566] insert_page_into_pte_locked+0x1ee/0x3c0 [ 417.813132] insert_page+0x78/0xf0 [ 417.813558] vmf_insert_page_mkwrite+0x55/0xa0 [ 417.814088] dax_fault_iter+0x484/0x7b0 [ 417.814542] dax_iomap_pte_fault+0x1ca/0x620 [ 417.815055] dax_iomap_fault+0x39/0x40 [ 417.815499] __xfs_write_fault+0x139/0x380 [ 417.815995] ? __handle_mm_fault+0x5e5/0x1a60 [ 417.816483] xfs_write_fault+0x41/0x50 [ 417.816966] xfs_filemap_fault+0x3b/0xe0 [ 417.817424] __do_fault+0x31/0x180 [ 417.817859] __handle_mm_fault+0xee1/0x1a60 [ 417.818325] ? debug_smp_processor_id+0x17/0x20 [ 417.818844] handle_mm_fault+0xe1/0x2b0 [...] The issue is that when we split a large ZONE_DEVICE folio to order-0 ones, we don't reset the order/_nr_pages. As folio->_nr_pages overlays page[1]->memcg_data, once page[1] is a folio, it suddenly looks like it has folio->memcg_data set. And we never manually initialize folio->memcg_data in fsdax code, because we never expect it to be set at all. When __lruvec_stat_mod_folio() then stumbles over such a folio, it tries to use folio->memcg_data (because it's non-NULL) but it does not actually point at a memcg, resulting in the problem. Alison also observed that these folios sometimes have "locked" set, which is rather concerning (folios locked from the beginning ...). The reason is that the order for large folios is stored in page[1]->flags, which become the folio->flags of a new small folio. Let's fix it by adding a folio helper to clear order/_nr_pages for splitting purposes. Maybe we should reinitialize other large folio flags / folio members as well when splitting, because they might similarly cause harm once page[1] becomes a folio? At least other flags in PAGE_FLAGS_SECOND should not be set for fsdax, so at least page[1]->flags might be as expected with this fix. From a quick glimpse, initializing ->mapping, ->pgmap and ->share should re-initialize most things from a previous page[1] used by large folios that fsdax cares about. For example folio->private might not get reinitialized, but maybe that's not relevant -- no traces of it's use in fsdax code. Needs a closer look. Another thing that should be considered in the future is performing similar checks as we perform in free_tail_page_prepare() -- checking pincount etc. -- when freeing a large fsdax folio. Link: https://lkml.kernel.org/r/20250410091020.119116-1-david@redhat.com Fixes: 4996fc547f5b ("mm: let _folio_nr_pages overlay memcg_data in first tail page") Fixes: 38607c62b34b ("fs/dax: properly refcount fs dax pages") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Alison Schofield <alison.schofield@intel.com> Closes: https://lkml.kernel.org/r/Z_W9Oeg-D9FhImf3@aschofie-mobl2.lan Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: "Darrick J. Wong" <djwong@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()Kirill A. Shutemov
When the last page in the zone is accepted, __accept_page() calls static_branch_dec(). This function takes cpu_hotplug_lock, which can lead to a deadlock if the allocation occurs during CPU bringup path as _cpu_up() also takes the lock. To prevent this deadlock, defer static_branch_dec() to a workqueue. Call static_branch_dec() only when the workqueue is not yet initialized. Workqueues are initialized before CPU bring up, so this will not conflict with the first scenario. Link: https://lkml.kernel.org/r/20250329171030.3942298-1-kirill.shutemov@linux.intel.com Fixes: 55ad43e8ba0f ("mm: add a helper to accept page") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Srikanth Aithal <sraithal@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17tracing: Fix filter string testingSteven Rostedt
The filter string testing uses strncpy_from_kernel/user_nofault() to retrieve the string to test the filter against. The if() statement was incorrect as it considered 0 as a fault, when it is only negative that it faulted. Running the following commands: # cd /sys/kernel/tracing # echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter # echo 1 > events/syscalls/sys_enter_openat/enable # ls /proc/$$/maps # cat trace Would produce nothing, but with the fix it will produce something like: ls-1192 [007] ..... 8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0) Link: https://lore.kernel.org/all/CAEf4BzbVPQ=BjWztmEwBPRKHUwNfKBkS3kce-Rzka6zvbQeVpg@mail.gmail.com/ Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250417183003.505835fb@gandalf.local.home Fixes: 77360f9bbc7e5 ("tracing: Add test for user space strings when filtering on string pointers") Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Reported-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17drm/xe/pxp: do not queue unneeded terminations from debugfsDaniele Ceraolo Spurio
The PXP terminate debugfs currently unconditionally simulates a termination, no matter what the HW status is. This is unneeded if PXP is not in use and can cause errors if the HW init hasn't completed yet. To solve these issues, we can simply limit the terminations to the cases where PXP is fully initialized and in use. v2: s/pxp_status/ready/ to avoid confusion with pxp->status (John) Fixes: 385a8015b214 ("drm/xe/pxp: Add PXP debugfs support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4749 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250416201622.1295369-1-daniele.ceraolospurio@intel.com (cherry picked from commit ba1f62a0cac84757ca35f4217e3cd3a2654233ae) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-17drm/xe/dma_buf: stop relying on placement in unmapMatthew Auld
The is_vram() is checking the current placement, however if we consider exported VRAM with dynamic dma-buf, it looks possible for the xe driver to async evict the memory, notifying the importer, however importer does not have to call unmap_attachment() immediately, but rather just as "soon as possible", like when the dma-resv idles. Following from this we would then pipeline the move, attaching the fence to the manager, and then update the current placement. But when the unmap_attachment() runs at some later point we might see that is_vram() is now false, and take the complete wrong path when dma-unmapping the sg, leading to explosions. To fix this check if the sgl was mapping a struct page. v2: - The attachment can be mapped multiple times it seems, so we can't really rely on encoding something in the attachment->priv. Instead see if the page_link has an encoded struct page. For vram we expect this to be NULL. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4563 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250410162716.159403-2-matthew.auld@intel.com (cherry picked from commit d755887f8e5a2a18e15e6632a5193e5feea18499) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>