summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-03-25objtool, panic: Disable SMAP in __stack_chk_fail()Josh Poimboeuf
__stack_chk_fail() can be called from uaccess-enabled code. Make sure uaccess gets disabled before calling panic(). Fixes the following warning: kernel/trace/trace_branch.o: error: objtool: ftrace_likely_update+0x1ea: call to __stack_chk_fail() with UACCESS enabled Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/a3e97e0119e1b04c725a8aa05f7bc83d98e657eb.1742852847.git.jpoimboe@kernel.org
2025-03-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Nested virtualization support for VGICv3, giving the nested hypervisor control of the VGIC hardware when running an L2 VM - Removal of 'late' nested virtualization feature register masking, making the supported feature set directly visible to userspace - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers - Paravirtual interface for discovering the set of CPU implementations where a VM may run, addressing a longstanding issue of guest CPU errata awareness in big-little systems and cross-implementation VM migration - Userspace control of the registers responsible for identifying a particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1), allowing VMs to be migrated cross-implementation - pKVM updates, including support for tracking stage-2 page table allocations in the protected hypervisor in the 'SecPageTable' stat - Fixes to vPMU, ensuring that userspace updates to the vPMU after KVM_RUN are reflected into the backing perf events LoongArch: - Remove unnecessary header include path - Assume constant PGD during VM context switch - Add perf events support for guest VM RISC-V: - Disable the kernel perf counter during configure - KVM selftests improvements for PMU - Fix warning at the time of KVM module removal x86: - Add support for aging of SPTEs without holding mmu_lock. Not taking mmu_lock allows multiple aging actions to run in parallel, and more importantly avoids stalling vCPUs. This includes an implementation of per-rmap-entry locking; aging the gfn is done with only a per-rmap single-bin spinlock taken, whereas locking an rmap for write requires taking both the per-rmap spinlock and the mmu_lock. Note that this decreases slightly the accuracy of accessed-page information, because changes to the SPTE outside aging might not use atomic operations even if they could race against a clear of the Accessed bit. This is deliberate because KVM and mm/ tolerate false positives/negatives for accessed information, and testing has shown that reducing the latency of aging is far more beneficial to overall system performance than providing "perfect" young/old information. - Defer runtime CPUID updates until KVM emulates a CPUID instruction, to coalesce updates when multiple pieces of vCPU state are changing, e.g. as part of a nested transition - Fix a variety of nested emulation bugs, and add VMX support for synthesizing nested VM-Exit on interception (instead of injecting #UD into L2) - Drop "support" for async page faults for protected guests that do not set SEND_ALWAYS (i.e. that only want async page faults at CPL3) - Bring a bit of sanity to x86's VM teardown code, which has accumulated a lot of cruft over the years. Particularly, destroy vCPUs before the MMU, despite the latter being a VM-wide operation - Add common secure TSC infrastructure for use within SNP and in the future TDX - Block KVM_CAP_SYNC_REGS if guest state is protected. It does not make sense to use the capability if the relevant registers are not available for reading or writing - Don't take kvm->lock when iterating over vCPUs in the suspend notifier to fix a largely theoretical deadlock - Use the vCPU's actual Xen PV clock information when starting the Xen timer, as the cached state in arch.hv_clock can be stale/bogus - Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend notifier only accounts for kvmclock, and there's no evidence that the flag is actually supported by Xen guests - Clean up the per-vCPU "cache" of its reference pvclock, and instead only track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately expensive to compute, and rarely changes for modern setups) - Don't write to the Xen hypercall page on MSR writes that are initiated by the host (userspace or KVM) to fix a class of bugs where KVM can write to guest memory at unexpected times, e.g. during vCPU creation if userspace has set the Xen hypercall MSR index to collide with an MSR that KVM emulates - Restrict the Xen hypercall MSR index to the unofficial synthetic range to reduce the set of possible collisions with MSRs that are emulated by KVM (collisions can still happen as KVM emulates Hyper-V MSRs, which also reside in the synthetic range) - Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config - Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID entries when updating PV clocks; there is no guarantee PV clocks will be updated between TSC frequency changes and CPUID emulation, and guest reads of the TSC leaves should be rare, i.e. are not a hot path x86 (Intel): - Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1 - Pass XFD_ERR as the payload when injecting #NM, as a preparatory step for upcoming FRED virtualization support - Decouple the EPT entry RWX protection bit macros from the EPT Violation bits, both as a general cleanup and in anticipation of adding support for emulating Mode-Based Execution Control (MBEC) - Reject KVM_RUN if userspace manages to gain control and stuff invalid guest state while KVM is in the middle of emulating nested VM-Enter - Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs in anticipation of adding sanity checks for secondary exit controls (the primary field is out of bits) x86 (AMD): - Ensure the PSP driver is initialized when both the PSP and KVM modules are built-in (the initcall framework doesn't handle dependencies) - Use long-term pins when registering encrypted memory regions, so that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to excessive fragmentation - Add macros and helpers for setting GHCB return/error codes - Add support for Idle HLT interception, which elides interception if the vCPU has a pending, unmasked virtual IRQ when HLT is executed - Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical address - Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g. because the vCPU was "destroyed" via SNP's AP Creation hypercall - Reject SNP AP Creation if the requested SEV features for the vCPU don't match the VM's configured set of features Selftests: - Fix again the Intel PMU counters test; add a data load and do CLFLUSH{OPT} on the data instead of executing code. The theory is that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters - Fix a flaw in the Intel PMU counters test where it asserts that an event is counting correctly without actually knowing what the event counts on the underlying hardware - Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and improve its coverage by collecting all dirty entries on each iteration - Fix a few minor bugs related to handling of stats FDs - Add infrastructure to make vCPU and VM stats FDs available to tests by default (open the FDs during VM/vCPU creation) - Relax an assertion on the number of HLT exits in the xAPIC IPI test when running on a CPU that supports AMD's Idle HLT (which elides interception of HLT if a virtual IRQ is pending and unmasked)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits) RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed RISC-V: KVM: Teardown riscv specific bits after kvm_exit LoongArch: KVM: Register perf callbacks for guest LoongArch: KVM: Implement arch-specific functions for guest perf LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel() LoongArch: KVM: Remove PGD saving during VM context switch LoongArch: KVM: Remove unnecessary header include path KVM: arm64: Tear down vGIC on failed vCPU creation KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected KVM: x86: Add infrastructure for secure TSC KVM: x86: Push down setting vcpu.arch.user_set_tsc ...
2025-03-25Merge tag 'x86_bugs_for_v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 speculation mitigation updates from Borislav Petkov: - Some preparatory work to convert the mitigations machinery to mitigating attack vectors instead of single vulnerabilities - Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag - Add support for a Zen5-specific SRSO mitigation - Cleanups and minor improvements * tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2 x86/bugs: Use the cpu_smt_possible() helper instead of open-coded code x86/bugs: Add AUTO mitigations for mds/taa/mmio/rfds x86/bugs: Relocate mds/taa/mmio/rfds defines x86/bugs: Add X86_BUG_SPECTRE_V2_USER x86/bugs: Remove X86_FEATURE_USE_IBPB KVM: nVMX: Always use IBPB to properly virtualize IBRS x86/bugs: Use a static branch to guard IBPB on vCPU switch x86/bugs: Remove the X86_FEATURE_USE_IBPB check in ib_prctl_set() x86/mm: Remove X86_FEATURE_USE_IBPB checks in cond_mitigation() x86/bugs: Move the X86_FEATURE_USE_IBPB check into callers x86/bugs: KVM: Add support for SRSO_MSR_FIX
2025-03-25Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Nothing major this time around. Apart from the usual perf/PMU updates, some page table cleanups, the notable features are average CPU frequency based on the AMUv1 counters, CONFIG_HOTPLUG_SMT and MOPS instructions (memcpy/memset) in the uaccess routines. Perf and PMUs: - Support for the 'Rainier' CPU PMU from Arm - Preparatory driver changes and cleanups that pave the way for BRBE support - Support for partial virtualisation of the Apple-M1 PMU - Support for the second event filter in Arm CSPMU designs - Minor fixes and cleanups (CMN and DWC PMUs) - Enable EL2 requirements for FEAT_PMUv3p9 Power, CPU topology: - Support for AMUv1-based average CPU frequency - Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It adds a generic topology_is_primary_thread() function overridden by x86 and powerpc New(ish) features: - MOPS (memcpy/memset) support for the uaccess routines Security/confidential compute: - Fix the DMA address for devices used in Realms with Arm CCA. The CCA architecture uses the address bit to differentiate between shared and private addresses - Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by default Memory management clean-ups: - Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs - Some minor page table accessor clean-ups - PIE/POE (permission indirection/overlay) helpers clean-up Kselftests: - MTE: skip hugetlb tests if MTE is not supported on such mappings and user correct naming for sync/async tag checking modes Miscellaneous: - Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people request) - Sysreg updates for new register fields - CPU type info for some Qualcomm Kryo cores" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits) arm64: mm: Don't use %pK through printk perf/arm_cspmu: Fix missing io.h include arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists arm64: cputype: Add MIDR_CORTEX_A76AE arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list arm64/sysreg: Enforce whole word match for open/close tokens arm64/sysreg: Fix unbalanced closing block arm64: Kconfig: Enable HOTPLUG_SMT arm64: topology: Support SMT control on ACPI based system arch_topology: Support SMT control for OF based system cpu/SMT: Provide a default topology_is_primary_thread() arm64/mm: Define PTDESC_ORDER perf/arm_cspmu: Add PMEVFILT2R support perf/arm_cspmu: Generalise event filtering perf/arm_cspmu: Move register definitons to header arm64/kernel: Always use level 2 or higher for early mappings arm64/mm: Drop PXD_TABLE_BIT arm64/mm: Check pmd_table() in pmd_trans_huge() ...
2025-03-25Merge branches 'for-next/amuv1-avg-freq', 'for-next/pkey_unrestricted', ↵Catalin Marinas
'for-next/sysreg', 'for-next/misc', 'for-next/pgtable-cleanups', 'for-next/kselftest', 'for-next/uaccess-mops', 'for-next/pie-poe-cleanup', 'for-next/cputype-kryo', 'for-next/cca-dma-address', 'for-next/drop-pxd_table_bit' and 'for-next/spectre-bhb-assume-vulnerable', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf/arm_cspmu: Fix missing io.h include perf/arm_cspmu: Add PMEVFILT2R support perf/arm_cspmu: Generalise event filtering perf/arm_cspmu: Move register definitons to header drivers/perf: apple_m1: Support host/guest event filtering drivers/perf: apple_m1: Refactor event select/filter configuration perf/dwc_pcie: fix duplicate pci_dev devices perf/dwc_pcie: fix some unreleased resources perf/arm-cmn: Minor event type housekeeping perf: arm_pmu: Move PMUv3-specific data perf: apple_m1: Don't disable counter in m1_pmu_enable_event() perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event() perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event() perf: arm_pmu: Don't disable counter in armpmu_add() perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters perf: arm_pmuv3: Add support for ARM Rainier PMU * for-next/amuv1-avg-freq: : Add support for AArch64 AMUv1-based average freq arm64: Utilize for_each_cpu_wrap for reference lookup arm64: Update AMU-based freq scale factor on entering idle arm64: Provide an AMU-based version of arch_freq_get_on_cpu cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry cpufreq: Allow arch_freq_get_on_cpu to return an error arch_topology: init capacity_freq_ref to 0 * for-next/pkey_unrestricted: : mm/pkey: Add PKEY_UNRESTRICTED macro selftest/powerpc/mm/pkey: fix build-break introduced by commit 00894c3fc917 selftests/powerpc: Use PKEY_UNRESTRICTED macro selftests/mm: Use PKEY_UNRESTRICTED macro mm/pkey: Add PKEY_UNRESTRICTED macro * for-next/sysreg: : arm64 sysreg updates arm64/sysreg: Enforce whole word match for open/close tokens arm64/sysreg: Fix unbalanced closing block arm64/sysreg: Add register fields for HFGWTR2_EL2 arm64/sysreg: Add register fields for HFGRTR2_EL2 arm64/sysreg: Add register fields for HFGITR2_EL2 arm64/sysreg: Add register fields for HDFGWTR2_EL2 arm64/sysreg: Add register fields for HDFGRTR2_EL2 arm64/sysreg: Update register fields for ID_AA64MMFR0_EL1 * for-next/misc: : Miscellaneous arm64 patches arm64: mm: Don't use %pK through printk arm64/fpsimd: Remove unused declaration fpsimd_kvm_prepare() * for-next/pgtable-cleanups: : arm64 pgtable accessors cleanup arm64/mm: Define PTDESC_ORDER arm64/kernel: Always use level 2 or higher for early mappings arm64/hugetlb: Consistently use pud_sect_supported() arm64/mm: Convert __pte_to_phys() and __phys_to_pte_val() as functions * for-next/kselftest: : arm64 kselftest updates kselftest/arm64: mte: Skip the hugetlb tests if MTE not supported on such mappings kselftest/arm64: mte: Use the correct naming for tag check modes in check_hugetlb_options.c * for-next/uaccess-mops: : Implement the uaccess memory copy/set using MOPS instructions arm64: lib: Use MOPS for usercopy routines arm64: mm: Handle PAN faults on uaccess CPY* instructions arm64: extable: Add fixup handling for uaccess CPY* instructions * for-next/pie-poe-cleanup: : PIE/POE helpers cleanup arm64/sysreg: Move POR_EL0_INIT to asm/por.h arm64/sysreg: Rename POE_RXW to POE_RWX arm64/sysreg: Improve PIR/POR helpers * for-next/cputype-kryo: : Add cputype info for some Qualcomm Kryo cores arm64: cputype: Add comments about Qualcomm Kryo 5XX and 6XX cores arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD * for-next/cca-dma-address: : Fix DMA address for devices used in realms with Arm CCA arm64: realm: Use aliased addresses for device DMA to shared buffers dma: Introduce generic dma_addr_*crypted helpers dma: Fix encryption bit clearing for dma_to_phys * for-next/drop-pxd_table_bit: : Drop the arm64 PXD_TABLE_BIT (clean-up in preparation for 128-bit PTEs) arm64/mm: Drop PXD_TABLE_BIT arm64/mm: Check pmd_table() in pmd_trans_huge() arm64/mm: Check PUD_TYPE_TABLE in pud_bad() arm64/mm: Check PXD_TYPE_TABLE in [p4d|pgd]_bad() arm64/mm: Clear PXX_TYPE_MASK and set PXD_TYPE_SECT in [pmd|pud]_mkhuge() arm64/mm: Clear PXX_TYPE_MASK in mk_[pmd|pud]_sect_prot() arm64/ptdump: Test PMD_TYPE_MASK for block mapping KVM: arm64: ptdump: Test PMD_TYPE_MASK for block mapping * for-next/spectre-bhb-assume-vulnerable: : Rework Spectre BHB mitigations to not assume "safe" arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists arm64: cputype: Add MIDR_CORTEX_A76AE arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list
2025-03-25Merge tag 'timers-vdso-2025-03-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull VDSO infrastructure updates from Thomas Gleixner: - Consolidate the VDSO storage The VDSO data storage and data layout has been largely architecture specific for historical reasons. That increases the maintenance effort and causes inconsistencies over and over. There is no real technical reason for architecture specific layouts and implementations. The architecture specific details can easily be integrated into a generic layout, which also reduces the amount of duplicated code for managing the mappings. Convert all architectures over to a unified layout and common mapping infrastructure. This splits the VDSO data layout into subsystem specific blocks, timekeeping, random and architecture parts, which provides a better structure and allows to improve and update the functionalities without conflict and interaction. - Rework the timekeeping data storage The current implementation is designed for exposing system timekeeping accessors, which was good enough at the time when it was designed. PTP and Time Sensitive Networking (TSN) change that as there are requirements to expose independent PTP clocks, which are not related to system timekeeping. Replace the monolithic data storage by a structured layout, which allows to add support for independent PTP clocks on top while reusing both the data structures and the time accessor implementations. * tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) sparc/vdso: Always reject undefined references during linking x86/vdso: Always reject undefined references during linking vdso: Rework struct vdso_time_data and introduce struct vdso_clock vdso: Move architecture related data before basetime data powerpc/vdso: Prepare introduction of struct vdso_clock arm64/vdso: Prepare introduction of struct vdso_clock x86/vdso: Prepare introduction of struct vdso_clock time/namespace: Prepare introduction of struct vdso_clock vdso/namespace: Rename timens_setup_vdso_data() to reflect new vdso_clock struct vdso/vsyscall: Prepare introduction of struct vdso_clock vdso/gettimeofday: Prepare helper functions for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_coarse_timens() for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_coarse() for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_hres() for introduction of struct vdso_clock vdso/gettimeofday: Prepare introduction of struct vdso_clock vdso/helpers: Prepare introduction of struct vdso_clock vdso/datapage: Define vdso_clock to prepare for multiple PTP clocks vdso: Make vdso_time_data cacheline aligned arm64: Make asm/cache.h compatible with vDSO ...
2025-03-25tcp/dccp: remove icsk->icsk_timeoutEric Dumazet
icsk->icsk_timeout can be replaced by icsk->icsk_retransmit_timer.expires This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250324203607.703850-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge tag 'timers-core-2025-03-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: - Fix a memory ordering issue in posix-timers Posix-timer lookup is lockless and reevaluates the timer validity under the timer lock, but the update which validates the timer is not protected by the timer lock. That allows the store to be reordered against the initialization stores, so that the lookup side can observe a partially initialized timer. That's mostly a theoretical problem, but incorrect nevertheless. - Fix a long standing inconsistency of the coarse time getters The coarse time getters read the base time of the current update cycle without reading the actual hardware clock. NTP frequency adjustment can set the base time backwards. The fine grained interfaces compensate this by reading the clock and applying the new conversion factor, but the coarse grained time getters use the base time directly. That allows the user to observe time going backwards. Cure it by always forwarding base time, when NTP changes the frequency with an immediate step. - Rework of posix-timer hashing The posix-timer hash is not scalable and due to the CRIU timer restore mechanism prone to massive contention on the global hash bucket lock. Replace the global hash lock with a fine grained per bucket locking scheme to address that. - Rework the proc/$PID/timers interface. /proc/$PID/timers is provided for CRIU to be able to restore a timer. The printout happens with sighand lock held and interrupts disabled. That's not required as this can be done with RCU protection as well. - Provide a sane mechanism for CRIU to restore a timer ID CRIU restores timers by creating and deleting them until the kernel internal per process ID counter reached the requested ID. That's horribly slow for sparse timer IDs. Provide a prctl() which allows CRIU to restore a timer with a given ID. When enabled the ID pointer is used as input pointer to read the requested ID from user space. When disabled, the normal allocation scheme (next ID) is active as before. This is backwards compatible for both kernel and user space. - Make hrtimer_update_function() less expensive. The sanity checks are valuable, but expensive for high frequency usage in io/uring. Make the debug checks conditional and enable them only when lockdep is enabled. - Small updates, cleanups and improvements * tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) selftests/timers: Improve skew_consistency by testing with other clockids timekeeping: Fix possible inconsistencies in _COARSE clockids posix-timers: Drop redundant memset() invocation selftests/timers/posix-timers: Add a test for exact allocation mode posix-timers: Provide a mechanism to allocate a given timer ID posix-timers: Dont iterate /proc/$PID/timers with sighand:: Siglock held posix-timers: Make per process list RCU safe posix-timers: Avoid false cacheline sharing posix-timers: Switch to jhash32() posix-timers: Improve hash table performance posix-timers: Make signal_struct:: Next_posix_timer_id an atomic_t posix-timers: Make lock_timer() use guard() posix-timers: Rework timer removal posix-timers: Simplify lock/unlock_timer() posix-timers: Use guards in a few places posix-timers: Remove SLAB_PANIC from kmem cache posix-timers: Remove a few paranoid warnings posix-timers: Cleanup includes posix-timers: Add cond_resched() to posix_timer_add() search loop posix-timers: Initialise timer before adding it to the hash table ...
2025-03-25selftests/pidfd: fixes syscall number definesOleg Nesterov
I had to spend some (a lot;) time to understand why pidfd_info_test (and more) fails with my patch under qemu on my machine ;) Until I applied the patch below. I think it is a bad idea to do the things like #ifndef __NR_clone3 #define __NR_clone3 -1 #endif because this can hide a problem. My working laptop runs Fedora-23 which doesn't have __NR_clone3/etc in /usr/include/. So "make" happily succeeds, but everything fails and it is not clear why. Link: https://lore.kernel.org/r/20250323174518.GB834@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-25selftests: livepatch: test if ftrace can trace a livepatched functionFilipe Xavier
This new test makes sure that ftrace can trace a function that was introduced by a livepatch. Signed-off-by: Filipe Xavier <felipeaggger@gmail.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20250324-ftrace-sftest-livepatch-v3-2-d9d7cc386c75@gmail.com Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-25selftests: livepatch: add new ftrace helpers functionsFilipe Xavier
Add new ftrace helpers functions cleanup_tracing, trace_function and check_traced_functions. Signed-off-by: Filipe Xavier <felipeaggger@gmail.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20250324-ftrace-sftest-livepatch-v3-1-d9d7cc386c75@gmail.com Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-25iommufd/selftest: Add coverage for iommufd pasid attach/detachYi Liu
This tests iommufd pasid attach/replace/detach. Link: https://patch.msgid.link/r/20250321171940.7213-19-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25selftests/net: Drop timeout argument from test_client_verify()Dmitry Safonov
It's always TEST_TIMEOUT_SEC, with an unjustified exception in rst test, that is more paranoia-long timeout rather than based on requirements. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-7-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Delete timeout from test_connect_socket()Dmitry Safonov
Unused: it's always either the default timeout or asynchronous connect(). Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-6-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Print the testing side in unsigned-md5Dmitry Safonov
As both client and server print the same test name on failure or pass, add "[server]" so that it's more obvious from a log which side printed "ok" or "not ok". Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-5-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Add mixed select()+polling mode to TCP-AO testsDmitry Safonov
Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one, TEST_RETRANSMIT_SEC is used for operations that are expected to succeed in order for a test to pass. It is usually not consumed and exists only to avoid indefinite test run if the operation didn't complete. The second one, TEST_RETRANSMIT_SEC exists for the tests that checking operations, that are expected to fail/timeout. It is shorter as it is fully consumed, with an expectation that if operation didn't succeed during that period, it will timeout. And the related test that expects the timeout is passing. The actual operation failure is then cross-verified by other means like counters checks. The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact initial TCP timeout. So, in case the initial segment gets lost (quite unlikely on local veth interface between two net namespaces, yet happens in slow VMs), the retransmission never happens and as a result, the test is not actually testing the functionality. Which in the end fails counters checks. As I want tcp_ao selftests to be fast and finishing in a reasonable amount of time on manual run, I didn't consider increasing TEST_RETRANSMIT_SEC. Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever to make the initial TCP timeout shorter. But as it's not a socket bpf attached thing, but sock_ops (attaches to cgroups), the selftests would have to use libbpf, which I wanted to avoid if not absolutely required. Instead, use a mixed select() and counters polling mode with the longer TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It actually not only allows losing segments and succeeding after the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes the tests expecting timeout/failure pass faster. The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny "wrong snd id", which checks for no key on SYN-ACK for which there is no counter in the kernel (see tcp_make_synack()). Yet it can be speed up by poking skpair from the trace event (see trace_tcp_ao_synack_no_key). Fixes: ed9d09b309b1 ("selftests/net: Add a test for TCP-AO keys matching") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@kernel.org/ Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-4-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Fetch and check TCP-MD5 countersDmitry Safonov
There are related TCP-MD5 <=> TCP and TCP-MD5 <=> TCP-AO tests that can benefit from checking the related counters, not only from validating operations timeouts. It also prepares the code for introduction of mixed select()+poll mode, see the follow-up patches. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-3-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Provide tcp-ao counters comparison helperDmitry Safonov
Rename __test_tcp_ao_counters_cmp() into test_assert_counters_ao() and test_tcp_ao_key_counters_cmp() into test_assert_counters_key() as they are asserts, rather than just compare functions. Provide test_cmp_counters() helper, that's going to be used to compare ao_info and netns counters as a stop condition for polling the sockets. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-2-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Print TCP flags in more common formatDmitry Safonov
Before: ># 13145[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: !FS!R!P!., keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1 After: ># 13487[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: S, keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1 For the history, I think the initial format was to emphasize the absence of flags as well as their presence (!R meant no RST flag). But looking again, it's just unreadable and hard to understand. Make it the standard/expected one. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-1-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftest/livepatch: Only run test-kprobe with CONFIG_KPROBES_ON_FTRACESong Liu
CONFIG_KPROBES_ON_FTRACE is required for test-kprobe. Skip test-kprobe when CONFIG_KPROBES_ON_FTRACE is not set. Since some kernel may not have /proc/config.gz, grep for kprobe_ftrace_ops from /proc/kallsyms to check whether CONFIG_KPROBES_ON_FTRACE is enabled. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Link: https://lore.kernel.org/r/20250318181518.1055532-1-song@kernel.org [pmladek@suse.com: Call grep with -q option.] Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-25objtool: Remove redundant opts.noinstr dependencyJosh Poimboeuf
The --noinstr dependecy on --link is already enforced in the cmdline arg parsing code. Remove the redundant check. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/0ead7ffa0f5be2e81aebbcc585e07b2c98702b44.1742852847.git.jpoimboe@kernel.org
2025-03-25objtool: Fix up some outdated references to ENTRY/ENDPROCJosh Poimboeuf
ENTRY and ENDPROC were deprecated years ago and replaced with SYM_FUNC_{START,END}. Fix up a few outdated references in the objtool documentation and comments. Also fix a few typos. Suggested-by: Brendan Jackman <jackmanb@google.com> Suggested-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/5eb7e06e1a0e87aaeda8d583ab060e7638a6ea8e.1742852846.git.jpoimboe@kernel.org
2025-03-25objtool: Reduce CONFIG_OBJTOOL_WERROR verbosityJosh Poimboeuf
Remove the following from CONFIG_OBJTOOL_WERROR: * backtrace * "upgraded warnings to errors" message * cmdline args This makes the default output less cluttered and makes it easier to spot the actual warnings. Note the above options are still are available with --verbose or OBJTOOL_VERBOSE=1. Also, do the cmdline arg printing on all warnings, regardless of werror. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/d61df69f64b396fa6b2a1335588aad7a34ea9e71.1742852846.git.jpoimboe@kernel.org
2025-03-25objtool: Improve error handlingJosh Poimboeuf
Fix some error handling issues, improve error messages, properly distinguish betwee errors and warnings, and generally try to make all the error handling more consistent. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/3094bb4463dad29b6bd1bea03848d1571ace771c.1742852846.git.jpoimboe@kernel.org
2025-03-25objtool: Properly disable uaccess validationJosh Poimboeuf
If opts.uaccess isn't set, the uaccess validation is disabled, but only partially: it doesn't read the uaccess_safe_builtin list but still tries to do the validation. Disable it completely to prevent false warnings. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/0e95581c1d2107fb5f59418edf2b26bba38b0cbb.1742852846.git.jpoimboe@kernel.org
2025-03-25objtool: Silence more KCOV warningsJosh Poimboeuf
In the past there were issues with KCOV triggering unreachable instruction warnings, which is why unreachable warnings are now disabled with CONFIG_KCOV. Now some new KCOV warnings are showing up with GCC 14: vmlinux.o: warning: objtool: cpuset_write_resmask() falls through to next function cpuset_update_active_cpus.cold() drivers/usb/core/driver.o: error: objtool: usb_deregister() falls through to next function usb_match_device() sound/soc/codecs/snd-soc-wcd934x.o: warning: objtool: .text.wcd934x_slim_irq_handler: unexpected end of section All are caused by GCC KCOV not finishing an optimization, leaving behind a never-taken conditional branch to a basic block which falls through to the next function (or end of section). At a high level this is similar to the unreachable warnings mentioned above, in that KCOV isn't fully removing dead code. Treat it the same way by adding these to the list of warnings to ignore with CONFIG_KCOV. Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/66a61a0b65d74e072d3dc02384e395edb2adc3c5.1742852846.git.jpoimboe@kernel.org Closes: https://lore.kernel.org/Z9iTsI09AEBlxlHC@gmail.com Closes: https://lore.kernel.org/oe-kbuild-all/202503180044.oH9gyPeg-lkp@intel.com/
2025-03-25objtool: Fix init_module() handlingJosh Poimboeuf
If IBT is enabled and a module uses the deprecated init_module() magic function name rather than module_init(fn), its ENDBR will get removed, causing an IBT failure during module load. Objtool does print an obscure warning, but then does nothing to either correct it or return an error. Improve the usefulness of the warning and return an error so it will at least fail the build with CONFIG_OBJTOOL_WERROR. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/366bfdbe92736cde9fb01d5d3eb9b98e9070a1ec.1742852846.git.jpoimboe@kernel.org
2025-03-25objtool: Fix X86_FEATURE_SMAP alternative handlingJosh Poimboeuf
For X86_FEATURE_SMAP alternatives which replace NOP with STAC or CLAC, uaccess validation skips the NOP branch to avoid following impossible code paths, e.g. where a STAC would be patched but a CLAC wouldn't. However, it's not safe to assume an X86_FEATURE_SMAP alternative is patching STAC/CLAC. There can be other alternatives, like static_cpu_has(), where both branches need to be validated. Fix that by repurposing ANNOTATE_IGNORE_ALTERNATIVE for skipping either original instructions or new ones. This is a more generic approach which enables the removal of the feature checking hacks and the insn->ignore bit. Fixes the following warnings: arch/x86/mm/fault.o: warning: objtool: do_user_addr_fault+0x8ec: __stack_chk_fail() missing __noreturn in .c/.h or NORETURN() in noreturns.h arch/x86/mm/fault.o: warning: objtool: do_user_addr_fault+0x8f1: unreachable instruction [ mingo: Fix up conflicts with recent x86 changes. ] Fixes: ea24213d8088 ("objtool: Add UACCESS validation") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/de0621ca242130156a55d5d74fed86994dfa4c9c.1742852846.git.jpoimboe@kernel.org Closes: https://lore.kernel.org/oe-kbuild-all/202503181736.zkZUBv4N-lkp@intel.com/
2025-03-25objtool: Ignore entire functions rather than instructionsJosh Poimboeuf
STACK_FRAME_NON_STANDARD applies to functions. Use a function-specific ignore attribute in preparation for getting rid of insn->ignore. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/4af13376567f83331a9372ae2bb25e11a3d0f055.1742852846.git.jpoimboe@kernel.org
2025-03-25objtool: Warn when disabling unreachable warningsJosh Poimboeuf
Print a warning when disabling the unreachable warnings (due to a GCC bug). This will help determine if recent GCCs still have the issue and alert us if any other issues might be silently lurking behind the unreachable disablement. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/df243063787596e6031367e6659e7e43409d6c6d.1742852846.git.jpoimboe@kernel.org
2025-03-25objtool: Fix detection of consecutive jump tables on Clang 20Josh Poimboeuf
The jump table detection code assumes jump tables are in the same order as their corresponding indirect branches. That's apparently not always true with Clang 20. Fix that by changing how multiple jump tables are detected. In the first detection pass, mark the beginning of each jump table so the second pass can tell where one ends and the next one begins. Fixes the following warnings: vmlinux.o: warning: objtool: SiS_GetCRT2Ptr+0x1ad: stack state mismatch: cfa1=4+8 cfa2=5+16 sound/core/seq/snd-seq.o: warning: objtool: cc_ev_to_ump_midi2+0x589: return with modified stack frame Fixes: be2f0b1e1264 ("objtool: Get rid of reloc->jump_table_start") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/141752fff614eab962dba6bdfaa54aa67ff03bba.1742852846.git.jpoimboe@kernel.org Closes: https://lore.kernel.org/oe-kbuild-all/202503171547.LlCTJLQL-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202503200535.J3hAvcjw-lkp@intel.com/
2025-03-24Merge tag 'x86-cleanups-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Miscellaneous x86 cleanups by Arnd Bergmann, Charles Han, Mirsad Todorovac, Randy Dunlap, Thorsten Blum and Zhang Kunbo" * tag 'x86-cleanups-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/coco: Replace 'static const cc_mask' with the newly introduced cc_get_mask() function x86/delay: Fix inconsistent whitespace selftests/x86/syscall: Fix coccinelle WARNING recommending the use of ARRAY_SIZE() x86/platform: Fix missing declaration of 'x86_apple_machine' x86/irq: Fix missing declaration of 'io_apic_irqs' x86/usercopy: Fix kernel-doc func param name in clean_cache_range()'s description x86/apic: Use str_disabled_enabled() helper in print_ipi_mode()
2025-03-24Merge tag 'x86-fpu-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/fpu updates from Ingo Molnar: - Improve crypto performance by making kernel-mode FPU reliably usable in softirqs ((Eric Biggers) - Fully optimize out WARN_ON_FPU() (Eric Biggers) - Initial steps to support Support Intel APX (Advanced Performance Extensions) (Chang S. Bae) - Fix KASAN for arch_dup_task_struct() (Benjamin Berg) - Refine and simplify the FPU magic number check during signal return (Chang S. Bae) - Fix inconsistencies in guest FPU xfeatures (Chao Gao, Stanislav Spassov) - selftests/x86/xstate: Introduce common code for testing extended states (Chang S. Bae) - Misc fixes and cleanups (Borislav Petkov, Colin Ian King, Uros Bizjak) * tag 'x86-fpu-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu/xstate: Fix inconsistencies in guest FPU xfeatures x86/fpu: Clarify the "xa" symbolic name used in the XSTATE* macros x86/fpu: Use XSAVE{,OPT,C,S} and XRSTOR{,S} mnemonics in xstate.h x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs x86/fpu/xstate: Simplify print_xstate_features() x86/fpu: Refine and simplify the magic number check during signal return selftests/x86/xstate: Fix spelling mistake "hader" -> "header" x86/fpu: Avoid copying dynamic FP state from init_task in arch_dup_task_struct() vmlinux.lds.h: Remove entry to place init_task onto init_stack selftests/x86/avx: Add AVX tests selftests/x86/xstate: Clarify supported xstates selftests/x86/xstate: Consolidate test invocations into a single entry selftests/x86/xstate: Introduce signal ABI test selftests/x86/xstate: Refactor ptrace ABI test selftests/x86/xstate: Refactor context switching test selftests/x86/xstate: Enumerate and name xstate components selftests/x86/xstate: Refactor XSAVE helpers for general use selftests/x86: Consolidate redundant signal helper functions x86/fpu: Fix guest FPU state buffer allocation size x86/fpu: Fully optimize out WARN_ON_FPU()
2025-03-24Merge tag 'x86-core-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "x86 CPU features support: - Generate the <asm/cpufeaturemasks.h> header based on build config (H. Peter Anvin, Xin Li) - x86 CPUID parsing updates and fixes (Ahmed S. Darwish) - Introduce the 'setcpuid=' boot parameter (Brendan Jackman) - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan Jackman) - Utilize CPU-type for CPU matching (Pawan Gupta) - Warn about unmet CPU feature dependencies (Sohil Mehta) - Prepare for new Intel Family numbers (Sohil Mehta) Percpu code: - Standardize & reorganize the x86 percpu layout and related cleanups (Brian Gerst) - Convert the stackprotector canary to a regular percpu variable (Brian Gerst) - Add a percpu subsection for cache hot data (Brian Gerst) - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak) - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak) MM: - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction (Rik van Riel) - Rework ROX cache to avoid writable copy (Mike Rapoport) - PAT: restore large ROX pages after fragmentation (Kirill A. Shutemov, Mike Rapoport) - Make memremap(MEMREMAP_WB) map memory as encrypted by default (Kirill A. Shutemov) - Robustify page table initialization (Kirill A. Shutemov) - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn) - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW (Matthew Wilcox) KASLR: - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir Singh) CPU bugs: - Implement FineIBT-BHI mitigation (Peter Zijlstra) - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta) - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta) - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta) System calls: - Break up entry/common.c (Brian Gerst) - Move sysctls into arch/x86 (Joel Granados) Intel LAM support updates: (Maciej Wieczor-Retman) - selftests/lam: Move cpu_has_la57() to use cpuinfo flag - selftests/lam: Skip test if LAM is disabled - selftests/lam: Test get_user() LAM pointer handling AMD SMN access updates: - Add SMN offsets to exclusive region access (Mario Limonciello) - Add support for debugfs access to SMN registers (Mario Limonciello) - Have HSMP use SMN through AMD_NODE (Yazen Ghannam) Power management updates: (Patryk Wlazlyn) - Allow calling mwait_play_dead with an arbitrary hint - ACPI/processor_idle: Add FFH state handling - intel_idle: Provide the default enter_dead() handler - Eliminate mwait_play_dead_cpuid_hint() Build system: - Raise the minimum GCC version to 8.1 (Brian Gerst) - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor) Kconfig: (Arnd Bergmann) - Add cmpxchg8b support back to Geode CPUs - Drop 32-bit "bigsmp" machine support - Rework CONFIG_GENERIC_CPU compiler flags - Drop configuration options for early 64-bit CPUs - Remove CONFIG_HIGHMEM64G support - Drop CONFIG_SWIOTLB for PAE - Drop support for CONFIG_HIGHPTE - Document CONFIG_X86_INTEL_MID as 64-bit-only - Remove old STA2x11 support - Only allow CONFIG_EISA for 32-bit Headers: - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers (Thomas Huth) Assembly code & machine code patching: - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf) - x86/alternatives: Simplify callthunk patching (Peter Zijlstra) - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf) - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf) - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra) - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> (Uros Bizjak) - Use named operands in inline asm (Uros Bizjak) - Improve performance by using asm_inline() for atomic locking instructions (Uros Bizjak) Earlyprintk: - Harden early_serial (Peter Zijlstra) NMI handler: - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus() (Waiman Long) Miscellaneous fixes and cleanups: - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner, Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly Kuznetsov, Xin Li, liuye" * tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits) zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault x86/asm: Make asm export of __ref_stack_chk_guard unconditional x86/mm: Only do broadcast flush from reclaim if pages were unmapped perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones perf/x86/intel, x86/cpu: Simplify Intel PMU initialization x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions x86/asm: Use asm_inline() instead of asm() in clwb() x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h> x86/hweight: Use asm_inline() instead of asm() x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm() x86/hweight: Use named operands in inline asm() x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP x86/head/64: Avoid Clang < 17 stack protector in startup code x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro x86/cpu/intel: Limit the non-architectural constant_tsc model checks x86/mm/pat: Replace Intel x86_model checks with VFM ones x86/cpu/intel: Fix fast string initialization for extended Families ...
2025-03-24Merge tag 'perf-core-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: "Core: - Move perf_event sysctls into kernel/events/ (Joel Granados) - Use POLLHUP for pinned events in error (Namhyung Kim) - Avoid the read if the count is already updated (Peter Zijlstra) - Allow the EPOLLRDNORM flag for poll (Tao Chen) - locking/percpu-rwsem: Add guard support [ NOTE: this got (mis-)merged into the perf tree due to related work ] (Peter Zijlstra) perf_pmu_unregister() related improvements: (Peter Zijlstra) - Simplify the perf_event_alloc() error path - Simplify the perf_pmu_register() error path - Simplify perf_pmu_register() - Simplify perf_init_event() - Simplify perf_event_alloc() - Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count - Add this_cpc() helper - Introduce perf_free_addr_filters() - Robustify perf_event_free_bpf_prog() - Simplify the perf_mmap() control flow - Further simplify perf_mmap() - Remove retry loop from perf_mmap() - Lift event->mmap_mutex in perf_mmap() - Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes - Fix perf_mmap() failure path Uprobes: - Harden x86 uretprobe syscall trampoline check (Jiri Olsa) - Remove redundant spinlock in uprobe_deny_signal() (Liao Chang) - Remove the spinlock within handle_singlestep() (Liao Chang) x86 Intel PMU enhancements: - Support PEBS counters snapshotting (Kan Liang) - Fix intel_pmu_read_event() (Kan Liang) - Extend per event callchain limit to branch stack (Kan Liang) - Fix system-wide LBR profiling (Kan Liang) - Allocate bts_ctx only if necessary (Li RongQing) - Apply static call for drain_pebs (Peter Zijlstra) x86 AMD PMU enhancements: (Ravi Bangoria) - Remove pointless sample period check - Fix ->config to sample period calculation for OP PMU - Fix perf_ibs_op.cnt_mask for CurCnt - Don't allow freq mode event creation through ->config interface - Add PMU specific minimum period - Add ->check_period() callback - Ceil sample_period to min_period - Add support for OP Load Latency Filtering - Update DTLB/PageSize decode logic Hardware breakpoints: - Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar Bhaskar) Hardlockup detector improvements: (Li Huafei) - perf_event memory leak - Warn if watchdog_ev is leaked Fixes and cleanups: - Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter Zijlstra, Ravi Bangoria, Thorsten Blum, XieLudan)" * tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) perf: Fix __percpu annotation perf: Clean up pmu specific data perf/x86: Remove swap_task_ctx() perf/x86/lbr: Fix shorter LBRs call stacks for the system-wide mode perf: Supply task information to sched_task() perf: attach/detach PMU specific data locking/percpu-rwsem: Add guard support perf: Save PMU specific data in task_struct perf: Extend per event callchain limit to branch stack perf/ring_buffer: Allow the EPOLLRDNORM flag for poll perf/core: Use POLLHUP for pinned events in error perf/core: Use sysfs_emit() instead of scnprintf() perf/core: Remove optional 'size' arguments from strscpy() calls perf/x86/intel/bts: Check if bts_ctx is allocated when calling BTS functions uprobes/x86: Harden uretprobe syscall trampoline check watchdog/hardlockup/perf: Warn if watchdog_ev is leaked watchdog/hardlockup/perf: Fix perf_event memory leak perf/x86: Annotate struct bts_buffer::buf with __counted_by() perf/core: Clean up perf_try_init_event() perf/core: Fix perf_mmap() failure path ...
2025-03-24Merge tag 'sched-core-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core & fair scheduler changes: - Cancel the slice protection of the idle entity (Zihan Zhou) - Reduce the default slice to avoid tasks getting an extra tick (Zihan Zhou) - Force propagating min_slice of cfs_rq when {en,de}queue tasks (Tianchen Ding) - Refactor can_migrate_task() to elimate looping (I Hsin Cheng) - Add unlikey branch hints to several system calls (Colin Ian King) - Optimize current_clr_polling() on certain architectures (Yujun Dong) Deadline scheduler: (Juri Lelli) - Remove redundant dl_clear_root_domain call - Move dl_rebuild_rd_accounting to cpuset.h Uclamp: - Use the uclamp_is_used() helper instead of open-coding it (Xuewen Yan) - Optimize sched_uclamp_used static key enabling (Xuewen Yan) Scheduler topology support: (Juri Lelli) - Ignore special tasks when rebuilding domains - Add wrappers for sched_domains_mutex - Generalize unique visiting of root domains - Rebuild root domain accounting after every update - Remove partition_and_rebuild_sched_domains - Stop exposing partition_sched_domains_locked RSEQ: (Michael Jeanson) - Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y - Fix segfault on registration when rseq_cs is non-zero - selftests: Add rseq syscall errors test - selftests: Ensure the rseq ABI TLS is actually 1024 bytes Membarriers: - Fix redundant load of membarrier_state (Nysal Jan K.A.) Scheduler debugging: - Introduce and use preempt_model_str() (Sebastian Andrzej Siewior) - Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar) Fixes and cleanups: - Always save/restore x86 TSC sched_clock() on suspend/resume (Guilherme G. Piccoli) - Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian Andrzej Siewior)" * tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling() sched/debug: Remove CONFIG_SCHED_DEBUG sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional sched/debug: Make 'const_debug' tunables unconditional __read_mostly sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() rseq/selftests: Fix namespace collision with rseq UAPI header include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h sched/topology: Stop exposing partition_sched_domains_locked cgroup/cpuset: Remove partition_and_rebuild_sched_domains sched/topology: Remove redundant dl_clear_root_domain call sched/deadline: Rebuild root domain accounting after every update sched/deadline: Generalize unique visiting of root domains sched/topology: Wrappers for sched_domains_mutex sched/deadline: Ignore special tasks when rebuilding domains tracing: Use preempt_model_str() xtensa: Rely on generic printing of preemption model x86: Rely on generic printing of preemption model s390: Rely on generic printing of preemption model ...
2025-03-24Merge tag 'objtool-core-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - The biggest change is the new option to automatically fail the build on objtool warnings: CONFIG_OBJTOOL_WERROR. While there are no currently known unfixed false positives left, such an expansion in the severity of objtool warnings inevitably creates a risk of build failures, so it's disabled by default and depends on !COMPILE_TEST, so it shouldn't be enabled on allyesconfig/allmodconfig builds and won't be forced on people who just accept build-time defaults in 'make oldconfig'. While the option is strongly recommended, only people who enable it explicitly should see it. (Josh Poimboeuf) - Disable branch profiling in noinstr code with a broad brush that includes all of arch/x86/ and kernel/sched/. (Josh Poimboeuf) - Create backup object files on objtool errors and print exact objtool arguments to make failure analysis easier (Josh Poimboeuf) - Improve noreturn handling (Josh Poimboeuf) - Improve rodata handling (Tiezhu Yang) - Support jump tables, switch tables and goto tables on LoongArch (Tiezhu Yang) - Misc cleanups and fixes (Josh Poimboeuf, David Engraf, Ingo Molnar) * tag 'objtool-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) tracing: Disable branch profiling in noinstr code objtool: Use O_CREAT with explicit mode mask objtool: Add CONFIG_OBJTOOL_WERROR objtool: Create backup on error and print args objtool: Change "warning:" to "error:" for --Werror objtool: Add --Werror option objtool: Add --output option objtool: Upgrade "Linked object detected" warning to error objtool: Consolidate option validation objtool: Remove --unret dependency on --rethunk objtool: Increase per-function WARN_FUNC() rate limit objtool: Update documentation objtool: Improve __noreturn annotation warning objtool: Fix error handling inconsistencies in check() x86/traps: Make exc_double_fault() consistently noreturn LoongArch: Enable jump table for objtool objtool/LoongArch: Add support for goto table objtool/LoongArch: Add support for switch table objtool: Handle PC relative relocation type objtool: Handle different entry size of rodata ...
2025-03-24Merge tag 'rcu-next-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Boqun Feng: "Documentation: - Add broken-timing possibility to stallwarn.rst - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr() - Document self-propagating callbacks - Point call_srcu() to call_rcu() for detailed memory ordering - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text - Remove references to old grace-period-wait primitives srcu: - Introduce srcu_read_{un,}lock_fast(), which is similar to srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock at the cost of calling synchronize_rcu() in synchronize_srcu() Moreover, by returning the percpu offset of the counter at srcu_read_lock_fast() time, srcu_read_unlock_fast() can avoid extra pointer dereferencing, which makes it faster than srcu_read_{un,}lock_lite() srcu_read_{un,}lock_fast() are intended to replace rcu_read_{un,}lock_trace() if possible RCU torture: - Add get_torture_init_jiffies() to return the start time of the test - Add a test_boost_holdoff module parameter to allow delaying boosting tests when building rcutorture as built-in - Add grace period sequence number logging at the beginning and end of failure/close-call results - Switch to hexadecimal for the expedited grace period sequence number in the rcu_exp_grace_period trace point - Make cur_ops->format_gp_seqs take buffer length - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool - Complain when invalid SRCU reader_flavor is specified - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU uses atomics even when percpu ops are NMI safe, and use the Kconfig for SRCU lockdep testing Misc: - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes() - Fix get_state_synchronize_rcu_full() GP-start detection - Move RCU Tasks self-tests to core_initcall() - Print segment lengths in show_rcu_nocb_gp_state() - Make RCU watch ct_kernel_exit_state() warning - Flush console log from kernel_power_off() - rcutorture: Allow a negative value for nfakewriters - rcu: Update TREE05.boot to test normal synchronize_rcu() - rcu: Use _full() API to debug synchronize_rcu() Make RCU handle PREEMPT_LAZY better: - Fix header guard for rcu_all_qs() - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY - Update __cond_resched comment about RCU quiescent states - Handle unstable rdp in rcu_read_unlock_strict() - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y - osnoise: Provide quiescent states - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y combination - Limit PREEMPT_RCU configurations - Make rcutorture senario TREE07 and senario TREE10 use PREEMPT_LAZY=y" * tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (59 commits) rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y rcu: limit PREEMPT_RCU configurations rcutorture: Update ->extendables check for lazy preemption rcutorture: Update rcutorture_one_extend_check() for lazy preemption osnoise: provide quiescent states rcu: Use _full() API to debug synchronize_rcu() rcu: Update TREE05.boot to test normal synchronize_rcu() rcutorture: Allow a negative value for nfakewriters Flush console log from kernel_power_off() context_tracking: Make RCU watch ct_kernel_exit_state() warning rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state() rcu-tasks: Move RCU Tasks self-tests to core_initcall() rcu: Fix get_state_synchronize_rcu_full() GP-start detection torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe() srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing rcutorture: Complain when invalid SRCU reader_flavor is specified rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool rcutorture: Make cur_ops->format_gp_seqs take buffer length rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output ...
2025-03-24Merge tag 'bitmap-for-6.15' of https://github.com/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: - cpumask_next_wrap() rework (me) - GENMASK() simplification (I Hsin) - rust bindings for cpumasks (Viresh and me) - scattered cleanups (Andy, Tamir, Vincent, Ignacio and Joel) * tag 'bitmap-for-6.15' of https://github.com/norov/linux: (22 commits) cpumask: align text in comment riscv: fix test_and_{set,clear}_bit ordering documentation treewide: fix typo 'unsigned __init128' -> 'unsigned __int128' MAINTAINERS: add rust bindings entry for bitmap API rust: Add cpumask helpers uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)" cpumask: drop cpumask_next_wrap_old() PCI: hv: Switch hv_compose_multi_msi_req_get_cpu() to using cpumask_next_wrap() scsi: lpfc: rework lpfc_next_{online,present}_cpu() scsi: lpfc: switch lpfc_irq_rebalance() to using cpumask_next_wrap() s390: switch stop_machine_yield() to using cpumask_next_wrap() padata: switch padata_find_next() to using cpumask_next_wrap() cpumask: use cpumask_next_wrap() where appropriate cpumask: re-introduce cpumask_next{,_and}_wrap() cpumask: deprecate cpumask_next_wrap() powerpc/xmon: simplify xmon_batch_next_cpu() ibmvnic: simplify ibmvnic_set_queue_affinity() virtio_net: simplify virtnet_set_affinity() objpool: rework objpool_pop() cpumask: add for_each_{possible,online}_cpu_wrap ...
2025-03-24Merge tag 'lkmm.2025.03.21a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull kernel memory model updates from Paul McKenney: "Add more atomic operations, rework tags, and update documentation: - Add additional atomic operations (Puranjay Mohan) - Make better use of herd7 tags (Jonas Oberhauser) - Update documentation (Akira Yokosawa) These changes require v7.58 of the herd7 and klitmus tools, up from v7.52" * tag 'lkmm.2025.03.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: tools/memory-model: glossary.txt: Fix indents tools/memory-model/README: Fix typo tools/memory-model: Distinguish between syntactic and semantic tags tools/memory-model: Switch to softcoded herd7 tags tools/memory-model: Define effect of Mb tags on RMWs in tools/... tools/memory-model: Define applicable tags on operation in tools/... tools/memory-model: Legitimize current use of tags in LKMM macros tools/memory-model: Add atomic_andnot() with its variants tools/memory-model: Add atomic_and()/or()/xor() and add_negative
2025-03-24Merge tag 'nolibc-20250308-for-6.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull nolibc updates from Paul McKenney: - 32bit s390 support - opendir() and friends - openat() support - sscanf() support - various cleanups [ Paul has just forwarded the pull request from Thomas Weißschuh, so the tag signature is from Thomas, not Paul - Linus ] * tag 'nolibc-20250308-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (26 commits) tools/nolibc: don't use asm/ UAPI headers selftests/nolibc: stop testing constructor order selftests/nolibc: use O_RDONLY flag instead of 0 tools/nolibc: drop outdated example from overview comment tools/nolibc: process open() vararg as mode_t tools/nolibc: always use openat(2) instead of open(2) tools/nolibc: add support for openat(2) selftests/nolibc: add armthumb configuration selftests/nolibc: explicitly enable ARM mode Revert "selftests: kselftest: Fix build failure with NOLIBC" tools/nolibc: add support for [v]sscanf() tools/nolibc: add support for 32-bit s390 selftests/nolibc: rename s390 to s390x selftests/nolibc: only run constructor tests on nolibc selftests/nolibc: split up architecture list in run-tests.sh tools/nolibc: add support for directory access tools/nolibc: add support for sys_llseek() selftests/nolibc: always keep test kernel configuration up to date selftests/nolibc: execute defconfig before other targets selftests/nolibc: drop call to mrproper target ...
2025-03-24perf bpf-filter: Fix a parsing error with commaNamhyung Kim
The previous change to support cgroup filters introduced a bug that pathname can include commas. It confused the lexer to treat an item and the trailing comma as a single token. And it resulted in a parse error: $ sudo perf record -e cycles:P --filter 'period > 0, ip > 64' -- true perf_bpf_filter: Error: Unexpected item: 0, perf_bpf_filter: syntax error, unexpected BFT_ERROR, expecting BFT_NUM Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] --filter <filter> event filter It should get "0" and "," separately. An easiest fix would be to remove "," from the possible pathname characters. As it's for cgroup names, probably ok to assume it won't have commas in the pathname. I found that the existing BPF filtering test didn't have any complex filter condition with commas. Let's update the group filter test which is supposed to test filter combinations like this. Link: https://lore.kernel.org/r/20250307220922.434319-1-namhyung@kernel.org Fixes: 91e88437d5156b20 ("perf bpf-filter: Support filtering on cgroups") Reported-by: Sally Shi <sshii@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24Merge tag 'sched_ext-for-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext updates from Tejun Heo: - Add mechanism to count and report internal events. This significantly improves visibility on subtle corner conditions. - The default idle CPU selection logic is revamped and improved in multiple ways including being made topology aware. - sched_ext was disabling ttwu_queue for simplicity, which can be costly when hardware topology is more complex. Implement SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively enable ttwu_queue. - tools/sched_ext updates to improve compatibility among others. - Other misc updates and fixes. - sched_ext/for-6.14-fixes were pulled a few times to receive prerequisite fixes and resolve conflicts. * tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (42 commits) sched_ext: idle: Refactor scx_select_cpu_dfl() sched_ext: idle: Honor idle flags in the built-in idle selection policy sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local() sched_ext: Add trace point to track sched_ext core events sched_ext: Change the event type from u64 to s64 sched_ext: Documentation: add task lifecycle summary tools/sched_ext: Provide a compatible helper for scx_bpf_events() selftests/sched_ext: Add NUMA-aware scheduler test tools/sched_ext: Provide consistent access to scx flags sched_ext: idle: Fix scx_bpf_pick_any_cpu_node() behavior sched_ext: idle: Introduce scx_bpf_nr_node_ids() sched_ext: idle: Introduce node-aware idle cpu kfunc helpers sched_ext: idle: Per-node idle cpumasks sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE sched_ext: idle: Make idle static keys private sched/topology: Introduce for_each_node_numadist() iterator mm/numa: Introduce nearest_node_nodemask() nodemask: numa: reorganize inclusion path nodemask: add nodes_copy() tools/sched_ext: Sync with scx repo ...
2025-03-24perf report: Fix a memory leak for perf_env on AMDNamhyung Kim
The env.pmu_mapping can be leaked when it reads data from a pipe on AMD. For a pipe data, it reads the header data including pmu_mapping from PERF_RECORD_HEADER_FEATURE runtime. But it's already set in: perf_session__new() __perf_session__new() evlist__init_trace_event_sample_raw() evlist__has_amd_ibs() perf_env__nr_pmu_mappings() Then it'll overwrite that when it processes the HEADER_FEATURE record. Here's a report from address sanitizer. Direct leak of 2689 byte(s) in 1 object(s) allocated from: #0 0x7fed8f814596 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98 #1 0x5595a7d416b1 in strbuf_grow util/strbuf.c:64 #2 0x5595a7d414ef in strbuf_init util/strbuf.c:25 #3 0x5595a7d0f4b7 in perf_env__read_pmu_mappings util/env.c:362 #4 0x5595a7d12ab7 in perf_env__nr_pmu_mappings util/env.c:517 #5 0x5595a7d89d2f in evlist__has_amd_ibs util/amd-sample-raw.c:315 #6 0x5595a7d87fb2 in evlist__init_trace_event_sample_raw util/sample-raw.c:23 #7 0x5595a7d7f893 in __perf_session__new util/session.c:179 #8 0x5595a7b79572 in perf_session__new util/session.h:115 #9 0x5595a7b7e9dc in cmd_report builtin-report.c:1603 #10 0x5595a7c019eb in run_builtin perf.c:351 #11 0x5595a7c01c92 in handle_internal_command perf.c:404 #12 0x5595a7c01deb in run_argv perf.c:448 #13 0x5595a7c02134 in main perf.c:556 #14 0x7fed85833d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 Let's free the existing pmu_mapping data if any. Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250311000416.817631-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24Merge tag 'seccomp-v6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: - avoid the lock trip seccomp_filter_release in common case (Mateusz Guzik) - remove unused 'sd' argument through-out (Oleg Nesterov) - selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64 * tag 'seccomp-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: avoid the lock trip seccomp_filter_release in common case seccomp: remove the 'sd' argument from __seccomp_filter() seccomp: remove the 'sd' argument from __secure_computing() seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER seccomp/mips: change syscall_trace_enter() to use secure_computing() selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64
2025-03-24Merge tag 'move-lib-kunit-v6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull lib kunit selftest move from Kees Cook: "This is a one-off tree to coordinate the move of selftests out of lib/ and into lib/tests/. A separate tree was used for this to keep the paths sane with all the work in the same place. - move lib/ selftests into lib/tests/ (Kees Cook, Gabriela Bittencourt, Luis Felipe Hernandez, Lukas Bulwahn, Tamir Duberstein) - lib/math: Add int_log test suite (Bruno Sobreira França) - lib/math: Add Kunit test suite for gcd() (Yu-Chun Lin) - lib/tests/kfifo_kunit.c: add tests for the kfifo structure (Diego Vieira) - unicode: refactor selftests into KUnit (Gabriela Bittencourt) - lib/prime_numbers: convert self-test to KUnit (Tamir Duberstein) - printf: convert self-test to KUnit (Tamir Duberstein) - scanf: convert self-test to KUnit (Tamir Duberstein)" * tag 'move-lib-kunit-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (21 commits) scanf: break kunit into test cases scanf: convert self-test to KUnit scanf: remove redundant debug logs scanf: implicate test line in failure messages printf: implicate test line in failure messages printf: break kunit into test cases printf: convert self-test to KUnit kunit/fortify: Replace "volatile" with OPTIMIZER_HIDE_VAR() kunit/fortify: Expand testing of __compiletime_strlen() kunit/stackinit: Use fill byte different from Clang i386 pattern kunit/overflow: Fix DEFINE_FLEX tests for counted_by selftests: remove reference to prime_numbers.sh MAINTAINERS: adjust entries in FORTIFY_SOURCE and KERNEL HARDENING lib/prime_numbers: convert self-test to KUnit lib/math: Add Kunit test suite for gcd() unicode: kunit: change tests filename and path unicode: kunit: refactor selftest to kunit tests lib/tests/kfifo_kunit.c: add tests for the kfifo structure lib: Move KUnit tests into tests/ subdirectory lib/math: Add int_log test suite ...
2025-03-24selftests: vxlan_bridge: Test flood with unresolved FDB entryAmit Cohen
Extend flood test to configure FDB entry with unresolved destination IP, check that packets are not sent twice. Without the previous patch which handles such scenario in mlxsw, the tests fail: $ TESTS='test_flood' ./vxlan_bridge_1d.sh Running tests with UDP port 4789 TEST: VXLAN: flood [ OK ] TEST: VXLAN: flood, unresolved FDB entry [FAIL] vx2 ns2: Expected to capture 10 packets, got 20. $ TESTS='test_flood' ./vxlan_bridge_1q.sh INFO: Running tests with UDP port 4789 TEST: VXLAN: flood vlan 10 [ OK ] TEST: VXLAN: flood vlan 20 [ OK ] TEST: VXLAN: flood vlan 10, unresolved FDB entry [FAIL] vx10 ns2: Expected to capture 10 packets, got 20. TEST: VXLAN: flood vlan 20, unresolved FDB entry [FAIL] vx20 ns2: Expected to capture 10 packets, got 20. With the previous patch, the tests pass. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/7bc96e317531f3bf06319fb2ea447bd8666f29fa.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24tools/rv: Allow rv list to filter for containerGabriele Monaco
Add possibility to supply the container name to rv list: # rv list sched mon1 mon2 mon3 This lists only monitors in sched, without indentation. Supplying -h, any option (string starting with -) or more than 1 argument will still print the usage. Passing a non-existent container prints nothing and passing no container continues to print all monitors, showing indentation for nested monitors, reported after their container. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/20250305140406.350227-10-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24verification/dot2k: Add support for nested monitorsGabriele Monaco
RV now supports nested monitors, this functionality requires a container monitor, which has virtually no functionality besides holding other monitors, and nested monitors, that have a container as parent. Add the -p flag to pass a parent to a monitor, this sets it up while registering the monitor and adds necessary includes and configurations. Add the -c flag to create a container, since containers are empty, we don't allow supplying a dot model or a monitor type, the template is also different since functions to enable and disable the monitor are not defined, nor any tracepoint. The generated header file only allows to include the rv_monitor structure in children monitors. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/20250305140406.350227-8-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24tools/rv: Add support for nested monitorsGabriele Monaco
RV now supports nested monitors, this functionality requires a container monitor, which has virtually no functionality besides holding other monitors, and nested monitors, that have a container as parent. Nested monitors' sysfs folders are physically nested in the container's folder, and they are listed in the available_monitors file with the notation container:monitor. These changes go against the assumption that each line in the available_monitors file correspond to a folder in the rv directory, breaking the functionality of the rv tool. Add support for nested containers in the rv userspace tool, indenting nested monitors while listed and allowing both the notation with and without container name, which are equivalent: # rv list mon1 mon2 container: - nested1 - nested2 ## notation with container name # rv mon container:nested1 ## notation without container name # rv mon nested1 Either way, enabling a nested monitor is the same as enabling any other non-nested monitor. Selecting the container with rv mon enables all the nested monitors, if -t is passed, the trace also includes the monitor name next to the event: # rv mon nested1 -t <idle>-0 [004] event state1 x event -> state2 <idle>-0 [004] error event not expected in state2 # rv mon sched -t <idle>-0 [004] event_nested1 state1 x event -> state2 <idle>-0 [004] error_nested1 event not expected in state2 Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/20250305140406.350227-7-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>