summaryrefslogtreecommitdiff
path: root/arch/arm64/kernel/entry-common.c
AgeCommit message (Collapse)Author
8 daysMerge branches 'for-next/livepatch', 'for-next/user-contig-bbml2', ↵Catalin Marinas
'for-next/misc', 'for-next/acpi', 'for-next/debug-entry', 'for-next/feat_mte_tagged_far', 'for-next/kselftest', 'for-next/mdscr-cleanup' and 'for-next/vmap-stack', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: (23 commits) drivers/perf: hisi: Support PMUs with no interrupt drivers/perf: hisi: Relax the event number check of v2 PMUs drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver drivers/perf: hisi: Simplify the probe process for each DDRC version perf/arm-ni: Support sharing IRQs within an NI instance perf/arm-ni: Consolidate CPU affinity handling perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentation perf/cxlpmu: Remove unintended newline from IRQ name format string perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe() perf: arm_spe: Relax period restriction perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE) KVM: arm64: nvhe: Disable branch generation in nVHE guests arm64: Handle BRBE booting requirements arm64/sysreg: Add BRBE registers and fields perf/arm: Add missing .suppress_bind_attrs perf/arm-cmn: Reduce stack usage during discovery perf: imx9_perf: make the read-only array mask static const perf/arm-cmn: Broaden module description for wider interconnect support ... * for-next/livepatch: : Support for HAVE_LIVEPATCH on arm64 arm64: Kconfig: Keep selects somewhat alphabetically ordered arm64: Implement HAVE_LIVEPATCH arm64: stacktrace: Implement arch_stack_walk_reliable() arm64: stacktrace: Check kretprobe_find_ret_addr() return value arm64/module: Use text-poke API for late relocations. * for-next/user-contig-bbml2: : Optimise the TLBI when folding/unfolding contigous PTEs on hardware with BBML2 and no TLB conflict aborts arm64/mm: Elide tlbi in contpte_convert() under BBML2 iommu/arm: Add BBM Level 2 smmu feature arm64: Add BBM Level 2 cpu feature arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type * for-next/misc: : Miscellaneous arm64 patches arm64/gcs: task_gcs_el0_enable() should use passed task arm64: signal: Remove ISB when resetting POR_EL0 arm64/mm: Drop redundant addr increment in set_huge_pte_at() arm64: Mark kernel as tainted on SAE and SError panic arm64/gcs: Don't call gcs_free() when releasing task_struct arm64: fix unnecessary rebuilding when CONFIG_DEBUG_EFI=y arm64/mm: Optimize loop to reduce redundant operations of contpte_ptep_get arm64: pi: use 'targets' instead of extra-y in Makefile * for-next/acpi: : Various ACPI arm64 changes ACPI: Suppress misleading SPCR console message when SPCR table is absent ACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabled * for-next/debug-entry: : Simplify the debug exception entry path arm64: debug: remove debug exception registration infrastructure arm64: debug: split bkpt32 exception entry arm64: debug: split brk64 exception entry arm64: debug: split hardware watchpoint exception entry arm64: debug: split single stepping exception entry arm64: debug: refactor reinstall_suspended_bps() arm64: debug: split hardware breakpoint exception entry arm64: entry: Add entry and exit functions for debug exceptions arm64: debug: remove break/step handler registration infrastructure arm64: debug: call step handlers statically arm64: debug: call software breakpoint handlers statically arm64: refactor aarch32_break_handler() arm64: debug: clean up single_step_handler logic * for-next/feat_mte_tagged_far: : Support for reporting the non-address bits during a synchronous MTE tag check fault kselftest/arm64/mte: Add mtefar tests on check_mmap_options kselftest/arm64/mte: Refactor check_mmap_option test kselftest/arm64/mte: Add verification for address tag in signal handler kselftest/arm64/mte: Add address tag related macro and function kselftest/arm64/mte: Check MTE_FAR feature is supported kselftest/arm64/mte: Register mte signal handler with SA_EXPOSE_TAGBITS kselftest/arm64: Add MTE_FAR hwcap test KVM: arm64: Expose FEAT_MTE_TAGGED_FAR feature to guest arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR feature * for-next/kselftest: : Kselftest updates for arm64 kselftest/arm64: Handle attempts to disable SM on SME only systems kselftest/arm64: Fix SVE write data generation for SME only systems kselftest/arm64: Test SME on SME only systems in fp-ptrace kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace kselftest/arm64: Allow sve-ptrace to run on SME only systems kselftest/arm4: Provide local defines for AT_HWCAP3 kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace kselftest/arm64: Fix check for setting new VLs in sve-ptrace kselftest/arm64: Convert tpidr2 test to use kselftest.h * for-next/mdscr-cleanup: : Drop redundant DBG_MDSCR_* macros KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t arm64/debug: Drop redundant DBG_MDSCR_* macros * for-next/vmap-stack: : Force VMAP_STACK on arm64 arm64: remove CONFIG_VMAP_STACK checks from entry code arm64: remove CONFIG_VMAP_STACK checks from SDEI stack handling arm64: remove CONFIG_VMAP_STACK checks from stacktrace overflow logic arm64: remove CONFIG_VMAP_STACK conditionals from traps overflow stack arm64: remove CONFIG_VMAP_STACK conditionals from irq stack setup arm64: Remove CONFIG_VMAP_STACK conditionals from THREAD_SHIFT and THREAD_ALIGN arm64: efi: Remove CONFIG_VMAP_STACK check arm64: Mandate VMAP_STACK arm64: efi: Fix KASAN false positive for EFI runtime stack arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth() arm64/gcs: Don't call gcs_free() during flush_gcs() arm64: Restrict pagetable teardown to avoid false warning docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
2025-07-08arm64: remove CONFIG_VMAP_STACK checks from entry codeBreno Leitao
With VMAP_STACK now always enabled on arm64, remove all CONFIG_VMAP_STACK conditionals from entry handling in arch/arm64/kernel/entry-common.c and arch/arm64/kernel/entry.S. This change unconditionally includes the bad stack handling and overflow detection logic, simplifying the code and reflecting the mandatory use of VMAP_STACK for all arm64 kernel builds. Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707-arm64_vmap-v1-8-8de98ca0f91c@debian.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: remove debug exception registration infrastructureAda Couprie Diaz
Now that debug exceptions are handled individually and without the need for dynamic registration, remove the unused registration infrastructure. This removes the external caller for `debug_exception_enter()` and `debug_exception_exit()`. Make them static again and remove them from the header. Remove `early_brk64()` as it has been made redundant by (arm64: debug: split brk64 exception entry) and is not used anymore. Note : in `early_brk64()` `bug_brk_handler()` is called unconditionally as a fall-through, but now `call_break_hook()` only calls it if the immediate matches. This does not change the behaviour in early boot, as if `bug_brk_handler()` was called on a non-BUG immediate it would return DBG_HOOK_ERROR anyway, which `call_break_hook()` will do if no immediate matches. Remove `trap_init()`, as it would be empty and a weak definition already exists in `init/main.c`. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-14-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: split bkpt32 exception entryAda Couprie Diaz
Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. The BKPT32 exception can only be triggered by a BKPT instruction. Thus, we know that the PC is a legitimate address and isn't being used to train a branch predictor with a bogus address : we don't need to call `arm64_apply_bp_hardening()`. The handler for this exception only pends a signal and doesn't depend on any per-CPU state : we don't need to inhibit preemption, nor do we need to keep the DAIF exceptions masked, so we can unmask them earlier. Split the BKPT32 exception entry and adjust function signatures and its behaviour to match its relaxed constraints compared to other debug exceptions. We can also remove `NOKRPOBE_SYMBOL`, as this cannot lead to a kprobe recursion. This replaces the last usage of `el0_dbg()`, so remove it. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-13-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: split brk64 exception entryAda Couprie Diaz
Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. The BRK64 instruction can only be triggered by a BRK instruction. Thus, we know that the PC is a legitimate address and isn't being used to train a branch predictor with a bogus address : we don't need to call `arm64_apply_bp_hardening()`. We do not need to handle the Cortex-A76 erratum #1463225 either, as it only relevant for single stepping at EL1. BRK64 does not write FAR_EL1 either, as only hardware watchpoints do so. Split the BRK64 exception entry, adjust the function signature, and its behaviour to match the lack of needed mitigations. Further, as the EL0 and EL1 code paths are cleanly separated, we can split `do_brk64()` into `do_el0_brk64()` and `do_el1_brk64()`, and call them directly from the relevant entry paths. Use `die()` directly for the EL1 error path, as in `do_el1_bti()` and `do_el1_undef()`. We can also remove `NOKRPOBE_SYMBOL` for the EL0 path, as it cannot lead to a kprobe recursion. When taking a BRK64 exception from EL0, the exception handling is safely preemptible : the only possible handler is `uprobe_brk_handler()`. It only operates on task-local data and properly checks its validity, then raises a Thread Information Flag, processed before returning to userspace in `do_notify_resume()`, which is already preemptible. Thus we can safely unmask interrupts and enable preemption before handling the break itself, fixing a PREEMPT_RT issue where the handler could call a sleeping function with preemption disabled. Given that the break hook registration is handled statically in `call_break_hook` since (arm64: debug: call software break handlers statically) and that we now bypass the exception handler registration, this change renders `early_brk64` redundant : its functionality is now handled through the post-init path. This also removes the last usage of `el1_dbg()`. This also removes the last usage of `el0_dbg()` without `CONFIG_COMPAT`. Mark it `__maybe_unused`, to prevent a warning when building this patch without `CONFIG_COMPAT`, as the following patch removes `el0_dbg()`. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-12-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: split hardware watchpoint exception entryAda Couprie Diaz
Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. Hardware watchpoints are the only debug exceptions that will write FAR_EL1, so we need to preserve it and pass it down. However, they cannot be used to maliciously train branch predictors, so we can omit calling `arm64_bp_hardening()`, nor do they need to handle the Cortex-A76 erratum #1463225, as it only applies to single stepping exceptions. As the hardware watchpoint handler only returns 0 and never triggers the call to `arm64_notify_die()`, we can call it directly from `entry-common.c`. Split the hardware watchpoint exception entry and adjust the behaviour to match the lack of needed mitigations. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-11-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: split single stepping exception entryAda Couprie Diaz
Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. The single stepping exception has the most constraints : it can be exploited to train branch predictors and it needs special handling at EL1 for the Cortex-A76 erratum #1463225. We need to conserve all those mitigations. However, it does not write an address at FAR_EL1, as only hardware watchpoints do so. The single-step handler does its own signaling if it needs to and only returns 0, so we can call it directly from `entry-common.c`. Split the single stepping exception entry, adjust the function signature, keep the security mitigation and erratum handling. Further, as the EL0 and EL1 code paths are cleanly separated, we can split `do_softstep()` into `do_el0_softstep()` and `do_el1_softstep()` and call them directly from the relevant entry paths. We can also remove `NOKPROBE_SYMBOL` for the EL0 path, as it cannot lead to a kprobe recursion. Move the call to `arm64_apply_bp_hardening()` to `entry-common.c` so that we can do it as early as possible, and only for the exceptions coming from EL0, where it is needed. This is safe to do as it is `noinstr`, as are all the functions it may call. `el0_ia()` and `el0_pc()` already call it this way. When taking a soft-step exception from EL0, most of the single stepping handling is safely preemptible : the only possible handler is `uprobe_single_step_handler()`. It only operates on task-local data and properly checks its validity, then raises a Thread Information Flag, processed before returning to userspace in `do_notify_resume()`, which is already preemptible. However, the soft-step handler first calls `reinstall_suspended_bps()` to check if there is any hardware breakpoint or watchpoint pending or already stepped through. This cannot be preempted as it manipulates the hardware breakpoint and watchpoint registers. Move the call to `try_step_suspended_breakpoints()` to `entry-common.c` and adjust the relevant comments. We can now safely unmask interrupts before handling the step itself, fixing a PREEMPT_RT issue where the handler could call a sleeping function with preemption disabled. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Closes: https://lore.kernel.org/linux-arm-kernel/Z6YW_Kx4S2tmj2BP@uudg.org/ Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-10-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: split hardware breakpoint exception entryAda Couprie Diaz
Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. Hardware breakpoints exceptions are generated by the hardware after user configuration. As such, they can be exploited when training branch predictors outside of the userspace VA range: they still need to call `arm64_apply_bp_hardening()` if needed to mitigate against this attack. However, they do not need to handle the Cortex-A76 erratum #1463225 as it only applies to single stepping exceptions. It does not set an address in FAR_EL1 either, only the hardware watchpoint does. As the hardware breakpoint handler only returns 0 and never triggers the call to `arm64_notify_die()`, we can call it directly from `entry-common.c`. Split the hardware breakpoint exception entry, adjust the function signature, and handling of the Cortex-A76 erratum to fit the behaviour of the exception. Move the call to `arm64_apply_bp_hardening()` to `entry-common.c` so that we can do it as early as possible, and only for the exceptions coming from EL0, where it is needed. This is safe to do as it is `noinstr`, as are all the functions it may call. `el0_ia()` and `el0_pc()` already call it this way. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-8-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: entry: Add entry and exit functions for debug exceptionsAda Couprie Diaz
Move the `debug_exception_enter()` and `debug_exception_exit()` functions from mm/fault.c, as they are needed to split the debug exceptions entry paths from the current unified one. Make them externally visible in include/asm/exception.h until the caller in mm/fault.c is cleaned up. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-7-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-03arm64/debug: Drop redundant DBG_MDSCR_* macrosAnshuman Khandual
MDSCR_EL1 has already been defined in tools sysreg format and hence can be used in all debug monitor related call paths. But using generated sysreg definitions causes build warnings because there is a mismatch between mdscr variable (u32) and GENMASK() based masks (long unsigned int). Convert all variables handling MDSCR_EL1 register as u64 which also reflects its true width as well. -------------------------------------------------------------------------- arch/arm64/kernel/debug-monitors.c: In function ‘disable_debug_monitors’: arch/arm64/kernel/debug-monitors.c:108:13: warning: conversion from ‘long unsigned int’ to ‘u32’ {aka ‘unsigned int’} changes value from ‘18446744073709518847’ to ‘4294934527’ [-Woverflow] 108 | disable = ~MDSCR_EL1_MDE; | ^ -------------------------------------------------------------------------- While here, replace an open encoding with MDSCR_EL1_TDCC in __cpu_setup(). Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250613023646.1215700-2-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-01arm64: Implement HAVE_LIVEPATCHSong Liu
Allocate a task flag used to represent the patch pending state for the task. When a livepatch is being loaded or unloaded, the livepatch code uses this flag to select the proper version of a being patched kernel functions to use for current task. In arch/arm64/Kconfig, select HAVE_LIVEPATCH and include proper Kconfig. This is largely based on [1] by Suraj Jitindar Singh. [1] https://lore.kernel.org/all/20210604235930.603-1-surajjs@amazon.com/ Cc: Suraj Jitindar Singh <surajjs@amazon.com> Cc: Torsten Duwe <duwe@suse.de> Acked-by: Miroslav Benes <mbenes@suse.cz> Tested-by: Breno Leitao <leitao@debian.org> Tested-by: Andrea della Porta <andrea.porta@suse.com> Signed-off-by: Song Liu <song@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250630174502.842486-1-song@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-05-27Merge branch 'for-next/sme-fixes' into for-next/coreWill Deacon
* for-next/sme-fixes: (35 commits) arm64/fpsimd: Allow CONFIG_ARM64_SME to be selected arm64/fpsimd: ptrace: Gracefully handle errors arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state arm64/fpsimd: ptrace: Do not present register data for inactive mode arm64/fpsimd: ptrace: Save task state before generating SVE header arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid state arm64/fpsimd: ptrace/prctl: Ensure VL changes do not resurrect stale data arm64/fpsimd: Make clone() compatible with ZA lazy saving arm64/fpsimd: Clear PSTATE.SM during clone() arm64/fpsimd: Consistently preserve FPSIMD state during clone() arm64/fpsimd: Remove redundant task->mm check arm64/fpsimd: signal: Use SMSTOP behaviour in setup_return() arm64/fpsimd: Add task_smstop_sm() arm64/fpsimd: Factor out {sve,sme}_state_size() helpers arm64/fpsimd: Clarify sve_sync_*() functions arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE arm64/fpsimd: signal: Consistently read FPSIMD context arm64/fpsimd: signal: Mandate SVE payload for streaming-mode state arm64/fpsimd: signal: Clear PSTATE.SM when restoring FPSIMD frame only arm64/fpsimd: Do not discard modified SVE state ...
2025-05-08arm64/fpsimd: Do not discard modified SVE stateMark Rutland
Historically SVE state was discarded deterministically early in the syscall entry path, before ptrace is notified of syscall entry. This permitted ptrace to modify SVE state before and after the "real" syscall logic was executed, with the modified state being retained. This behaviour was changed by commit: 8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") That commit was intended to speed up workloads that used SVE by opportunistically leaving SVE enabled when returning from a syscall. The syscall entry logic was modified to truncate the SVE state without disabling userspace access to SVE, and fpsimd_save_user_state() was modified to discard userspace SVE state whenever in_syscall(current_pt_regs()) is true, i.e. when current_pt_regs()->syscallno != NO_SYSCALL. Leaving SVE enabled opportunistically resulted in a couple of changes to userspace visible behaviour which weren't described at the time, but are logical consequences of opportunistically leaving SVE enabled: * Signal handlers can observe the type of saved state in the signal's sve_context record. When the kernel only tracks FPSIMD state, the 'vq' field is 0 and there is no space allocated for register contents. When the kernel tracks SVE state, the 'vq' field is non-zero and the register contents are saved into the record. As a result of the above commit, 'vq' (and the presence of SVE register state) is non-deterministically zero or non-zero for a period of time after a syscall. The effective register state is still deterministic. Hopefully no-one relies on this being deterministic. In general, handlers for asynchronous events cannot expect a deterministic state. * Similarly to signal handlers, ptrace requests can observe the type of saved state in the NT_ARM_SVE and NT_ARM_SSVE regsets, as this is exposed in the header flags. As a result of the above commit, this is now in a non-deterministic state after a syscall. The effective register state is still deterministic. Hopefully no-one relies on this being deterministic. In general, debuggers would have to handle this changing at arbitrary points during program flow. Discarding the SVE state within fpsimd_save_user_state() resulted in other changes to userspace visible behaviour which are not desirable: * A ptrace tracer can modify (or create) a tracee's SVE state at syscall entry or syscall exit. As a result of the above commit, the tracee's SVE state can be discarded non-deterministically after modification, rather than being retained as it previously was. Note that for co-operative tracer/tracee pairs, the tracer may (re)initialise the tracee's state arbitrarily after the tracee sends itself an initial SIGSTOP via a syscall, so this affects realistic design patterns. * The current_pt_regs()->syscallno field can be modified via ptrace, and can be altered even when the tracee is not really in a syscall, causing non-deterministic discarding to occur in situations where this was not previously possible. Further, using current_pt_regs()->syscallno in this way is unsound: * There are data races between readers and writers of the current_pt_regs()->syscallno field. The current_pt_regs()->syscallno field is written in interruptible task context using plain C accesses, and is read in irq/softirq context using plain C accesses. These accesses are subject to data races, with the usual concerns with tearing, etc. * Writes to current_pt_regs()->syscallno are subject to compiler reordering. As current_pt_regs()->syscallno is written with plain C accesses, the compiler is free to move those writes arbitrarily relative to anything which doesn't access the same memory location. In theory this could break signal return, where prior to restoring the SVE state, restore_sigframe() calls forget_syscall(). If the write were hoisted after restore of some SVE state, that state could be discarded unexpectedly. In practice that reordering cannot happen in the absence of LTO (as cross compilation-unit function calls happen prevent this reordering), and that reordering appears to be unlikely in the presence of LTO. Additionally, since commit: f130ac0ae4412dbe ("arm64: syscall: unmask DAIF earlier for SVCs") ... DAIF is unmasked before el0_svc_common() sets regs->syscallno to the real syscall number. Consequently state may be saved in SVE format prior to this point. Considering all of the above, current_pt_regs()->syscallno should not be used to infer whether the SVE state can be discarded. Luckily we can instead use cpu_fp_state::to_save to track when it is safe to discard the SVE state: * At syscall entry, after the live SVE register state is truncated, set cpu_fp_state::to_save to FP_STATE_FPSIMD to indicate that only the FPSIMD portion is live and needs to be saved. * At syscall exit, once the task's state is guaranteed to be live, set cpu_fp_state::to_save to FP_STATE_CURRENT to indicate that TIF_SVE must be considered to determine which state needs to be saved. * Whenever state is modified, it must be saved+flushed prior to manipulation. The state will be truncated if necessary when it is saved, and reloading the state will set fp_state::to_save to FP_STATE_CURRENT, preventing subsequent discarding. This permits SVE state to be discarded *only* when it is known to have been truncated (and the non-FPSIMD portions must be zero), and ensures that SVE state is retained after it is explicitly modified. For backporting, note that this fix depends on the following commits: * b2482807fbd4 ("arm64/sme: Optimise SME exit on syscall entry") * f130ac0ae441 ("arm64: syscall: unmask DAIF earlier for SVCs") * 929fa99b1215 ("arm64/fpsimd: signal: Always save+flush state early") Fixes: 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Fixes: f130ac0ae441 ("arm64: syscall: unmask DAIF earlier for SVCs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-04-29arm64: enable PREEMPT_LAZYMark Rutland
For an architecture to enable CONFIG_ARCH_HAS_RESCHED_LAZY, two things are required: 1) Adding a TIF_NEED_RESCHED_LAZY flag definition 2) Checking for TIF_NEED_RESCHED_LAZY in the appropriate locations 2) is handled in a generic manner by CONFIG_GENERIC_ENTRY, which isn't (yet) implemented for arm64. However, outside of core scheduler code, TIF_NEED_RESCHED_LAZY only needs to be checked on a kernel exit, meaning: o return/entry to userspace. o return/entry to guest. The return/entry to a guest is all handled by xfer_to_guest_mode_handle_work() which already does the right thing, so it can be left as-is. arm64 doesn't use common entry's exit_to_user_mode_prepare(), so update its return to user path to check for TIF_NEED_RESCHED_LAZY and call into schedule() accordingly. Link: https://lore.kernel.org/linux-rt-users/20241216190451.1c61977c@mordecai.tesarici.cz/ Link: https://lore.kernel.org/all/xhsmh4j0fl0p3.mognet@vschneid-thinkpadt14sgen2i.remote.csb/ Signed-off-by: Mark Rutland <mark.rutland@arm.com> [testdrive, _TIF_WORK_MASK fixlet and changelog.] Signed-off-by: Mike Galbraith <efault@gmx.de> [Another round of testing; changelog faff] Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20250305104925.189198-2-vschneid@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-14Merge branch 'for-next/mops' into for-next/coreCatalin Marinas
* for-next/mops: : More FEAT_MOPS (memcpy instructions) uses - in-kernel routines arm64: mops: Document requirements for hypervisors arm64: lib: Use MOPS for copy_page() and clear_page() arm64: lib: Use MOPS for memcpy() routines arm64: mops: Document booting requirement for HCR_EL2.MCE2 arm64: mops: Handle MOPS exceptions from EL1 arm64: probes: Disable kprobes/uprobes on MOPS instructions # Conflicts: # arch/arm64/kernel/entry-common.c
2024-10-17arm64: mops: Handle MOPS exceptions from EL1Kristina Martsenko
We will soon be using MOPS instructions in the kernel, so wire up the exception handler to handle exceptions from EL1 caused by the copy/set operation being stopped and resumed on a different type of CPU. Add a helper for advancing the single step state machine, similarly to what the EL0 exception handler does. Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20240930161051.3777828-3-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-04arm64/traps: Handle GCS exceptionsMark Brown
A new exception code is defined for GCS specific faults other than standard load/store faults, for example GCS token validation failures, add handling for this. These faults are reported to userspace as segfaults with code SEGV_CPERR (protection error), mirroring the reporting for x86 shadow stack errors. GCS faults due to memory load/store operations generate data aborts with a flag set, these will be handled separately as part of the data abort handling. Since we do not currently enable GCS for EL1 we should not get any faults there but while we're at it we wire things up there, treating any GCS fault as fatal. Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-19-222b78d87eee@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-07-29treewide: context_tracking: Rename CONTEXT_* into CT_STATE_*Valentin Schneider
Context tracking state related symbols currently use a mix of the CONTEXT_ (e.g. CONTEXT_KERNEL) and CT_SATE_ (e.g. CT_STATE_MASK) prefixes. Clean up the naming and make the ctx_state enum use the CT_STATE_ prefix. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-02-20arm64: Unmask Debug + SError in do_notify_resume()Mark Rutland
When returning to a user context, the arm64 entry code masks all DAIF exceptions before handling pending work in exit_to_user_mode_prepare() and do_notify_resume(), where it will transiently unmask all DAIF exceptions. This is a holdover from the old entry assembly, which conservatively masked all DAIF exceptions, and it's only necessary to mask interrupts at this point during the exception return path, so long as we subsequently mask all DAIF exceptions before the actual exception return. While most DAIF manipulation follows a save...restore sequence, the manipulation in do_notify_resume() is the other way around, unmasking all DAIF exceptions before masking them again. This is unfortunate as we unnecessarily mask Debug and SError exceptions, and it would be nice to remove this special case to make DAIF manipulation simpler and most consistent. This patch changes exit_to_user_mode_prepare() and do_notify_resume() to only mask interrupts while handling pending work, masking other DAIF exceptions after this has completed. This removes the unusual DAIF manipulation and allows Debug and SError exceptions to be taken for a slightly longer window during the exception return path. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240206123848.1696480-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Itaru Kitayama <itaru.kitayama@linux.dev>
2024-02-20arm64: Move do_notify_resume() to entry-common.cMark Rutland
Currently do_notify_resume() lives in arch/arm64/kernel/signal.c, but it would make more sense for it to live in entry-common.c as it handles more than signals, and is coupled with the rest of the return-to-userspace sequence (e.g. with unusual DAIF masking that matches the exception return requirements). Move do_notify_resume() to entry-common.c. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240206123848.1696480-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Itaru Kitayama <itaru.kitayama@linux.dev>
2023-08-11arm64: syscall: unmask DAIF earlier for SVCsMark Rutland
For a number of historical reasons, when handling SVCs we don't unmask DAIF in el0_svc() or el0_svc_compat(), and instead do so later in el0_svc_common(). This is unfortunate and makes it harder to make changes to the DAIF management in entry-common.c as we'd like to do as cleanup and preparation for FEAT_NMI support. We can move the DAIF unmasking to entry-common.c as long as we also hoist the fp_user_discard() logic, as reasoned below. We converted the syscall trace logic from assembly to C in commit: f37099b6992a0b81 ("arm64: convert syscall trace logic to C") ... which was intended to have no functional change, and mirrored the existing assembly logic to avoid the risk of any functional regression. With the logic in C, it's clear that there is currently no reason to unmask DAIF so late within el0_svc_common(): * The thread flags are read prior to unmasking DAIF, but are not consumed until after DAIF is unmasked, and we don't perform a read-modify-write sequence of the thread flags for which we might need to serialize against an IPI modifying the flags. Similarly, for any thread flags set by other threads, whether DAIF is masked or not has no impact. The read_thread_flags() helpers performs a single-copy-atomic read of the flags, and so this can safely be moved after unmasking DAIF. * The pt_regs::orig_x0 and pt_regs::syscallno fields are neither consumed nor modified by the handler for any DAIF exception (e.g. these do not exist in the `perf_event_arm_regs` enum and are not sampled by perf in its IRQ handler). Thus, the manipulation of pt_regs::orig_x0 and pt_regs::syscallno can safely be moved after unmasking DAIF. Given the above, we can safely hoist unmasking of DAIF out of el0_svc_common(), and into its immediate callers: do_el0_svc() and do_el0_svc_compat(). Further: * In do_el0_svc(), we sample the syscall number from pt_regs::regs[8]. This is not modified by the handler for any DAIF exception, and thus can safely be moved after unmasking DAIF. As fp_user_discard() operates on the live FP/SVE/SME register state, this needs to occur before we clear DAIF.IF, as interrupts could result in preemption which would cause this state to become foreign. As fp_user_discard() is the first function called within do_el0_svc(), it has no dependency on other parts of do_el0_svc() and can be moved earlier so long as it is called prior to unmasking DAIF.IF. * In do_el0_svc_compat(), we sample the syscall number from pt_regs::regs[7]. This is not modified by the handler for any DAIF exception, and thus can safely be moved after unmasking DAIF. Compat threads cannot use SVE or SME, so there's no need for el0_svc_compat() to call fp_user_discard(). Given the above, we can safely hoist the unmasking of DAIF out of do_el0_svc() and do_el0_svc_compat(), and into their immediate callers: el0_svc() and el0_svc_compat(), so long a we also hoist fp_user_discard() into el0_svc(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230808101148.1064172-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-06-23Merge branches 'for-next/kpti', 'for-next/missing-proto-warn', ↵Catalin Marinas
'for-next/iss2-decode', 'for-next/kselftest', 'for-next/misc', 'for-next/feat_mops', 'for-next/module-alloc', 'for-next/sysreg', 'for-next/cpucap', 'for-next/acpi', 'for-next/kdump', 'for-next/acpi-doc', 'for-next/doc' and 'for-next/tpidr2-fix', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst docs: perf: Add new description for HiSilicon UC PMU drivers/perf: hisi: Add support for HiSilicon UC PMU driver drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE perf/arm-cmn: Add sysfs identifier perf/arm-cmn: Revamp model detection perf/arm_dmc620: Add cpumask dt-bindings: perf: fsl-imx-ddr: Add i.MX93 compatible drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver perf/arm_cspmu: Decouple APMT dependency perf/arm_cspmu: Clean up ACPI dependency ACPI/APMT: Don't register invalid resource perf/arm_cspmu: Fix event attribute type perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used drivers/perf: hisi: Don't migrate perf to the CPU going to teardown drivers/perf: apple_m1: Force 63bit counters for M2 CPUs perf/arm-cmn: Fix DTC reset perf: qcom_l2_pmu: Make l2_cache_pmu_probe_cluster() more robust perf/arm-cci: Slightly optimize cci_pmu_sync_counters() * for-next/kpti: : Simplify KPTI trampoline exit code arm64: entry: Simplify tramp_alias macro and tramp_exit routine arm64: entry: Preserve/restore X29 even for compat tasks * for-next/missing-proto-warn: : Address -Wmissing-prototype warnings arm64: add alt_cb_patch_nops prototype arm64: move early_brk64 prototype to header arm64: signal: include asm/exception.h arm64: kaslr: add kaslr_early_init() declaration arm64: flush: include linux/libnvdimm.h arm64: module-plts: inline linux/moduleloader.h arm64: hide unused is_valid_bugaddr() arm64: efi: add efi_handle_corrupted_x18 prototype arm64: cpuidle: fix #ifdef for acpi functions arm64: kvm: add prototypes for functions called in asm arm64: spectre: provide prototypes for internal functions arm64: move cpu_suspend_set_dbg_restorer() prototype to header arm64: avoid prototype warnings for syscalls arm64: add scs_patch_vmlinux prototype arm64: xor-neon: mark xor_arm64_neon_*() static * for-next/iss2-decode: : Add decode of ISS2 to data abort reports arm64/esr: Add decode of ISS2 to data abort reporting arm64/esr: Use GENMASK() for the ISS mask * for-next/kselftest: : Various arm64 kselftest improvements kselftest/arm64: Log signal code and address for unexpected signals kselftest/arm64: Add a smoke test for ptracing hardware break/watch points * for-next/misc: : Miscellaneous patches arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe arm64: hibernate: remove WARN_ON in save_processor_state arm64/fpsimd: Exit streaming mode when flushing tasks arm64: mm: fix VA-range sanity check arm64/mm: remove now-superfluous ISBs from TTBR writes arm64: consolidate rox page protection logic arm64: set __exception_irq_entry with __irq_entry as a default arm64: syscall: unmask DAIF for tracing status arm64: lockdep: enable checks for held locks when returning to userspace arm64/cpucaps: increase string width to properly format cpucaps.h arm64/cpufeature: Use helper for ECV CNTPOFF cpufeature * for-next/feat_mops: : Support for ARMv8.8 memcpy instructions in userspace kselftest/arm64: add MOPS to hwcap test arm64: mops: allow disabling MOPS from the kernel command line arm64: mops: detect and enable FEAT_MOPS arm64: mops: handle single stepping after MOPS exception arm64: mops: handle MOPS exceptions KVM: arm64: hide MOPS from guests arm64: mops: don't disable host MOPS instructions from EL2 arm64: mops: document boot requirements for MOPS KVM: arm64: switch HCRX_EL2 between host and guest arm64: cpufeature: detect FEAT_HCX KVM: arm64: initialize HCRX_EL2 * for-next/module-alloc: : Make the arm64 module allocation code more robust (clean-up, VA range expansion) arm64: module: rework module VA range selection arm64: module: mandate MODULE_PLTS arm64: module: move module randomization to module.c arm64: kaslr: split kaslr/module initialization arm64: kasan: remove !KASAN_VMALLOC remnants arm64: module: remove old !KASAN_VMALLOC logic * for-next/sysreg: (21 commits) : More sysreg conversions to automatic generation arm64/sysreg: Convert TRBIDR_EL1 register to automatic generation arm64/sysreg: Convert TRBTRG_EL1 register to automatic generation arm64/sysreg: Convert TRBMAR_EL1 register to automatic generation arm64/sysreg: Convert TRBSR_EL1 register to automatic generation arm64/sysreg: Convert TRBBASER_EL1 register to automatic generation arm64/sysreg: Convert TRBPTR_EL1 register to automatic generation arm64/sysreg: Convert TRBLIMITR_EL1 register to automatic generation arm64/sysreg: Rename TRBIDR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBTRG_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBMAR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBSR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBBASER_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBPTR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBLIMITR_EL1 fields per auto-gen tools format arm64/sysreg: Convert OSECCR_EL1 to automatic generation arm64/sysreg: Convert OSDTRTX_EL1 to automatic generation arm64/sysreg: Convert OSDTRRX_EL1 to automatic generation arm64/sysreg: Convert OSLAR_EL1 to automatic generation arm64/sysreg: Standardise naming of bitfield constants in OSL[AS]R_EL1 arm64/sysreg: Convert MDSCR_EL1 to automatic register generation ... * for-next/cpucap: : arm64 cpucap clean-up arm64: cpufeature: fold cpus_set_cap() into update_cpu_capabilities() arm64: cpufeature: use cpucap naming arm64: alternatives: use cpucap naming arm64: standardise cpucap bitmap names * for-next/acpi: : Various arm64-related ACPI patches ACPI: bus: Consolidate all arm specific initialisation into acpi_arm_init() * for-next/kdump: : Simplify the crashkernel reservation behaviour of crashkernel=X,high on arm64 arm64: add kdump.rst into index.rst Documentation: add kdump.rst to present crashkernel reservation on arm64 arm64: kdump: simplify the reservation behaviour of crashkernel=,high * for-next/acpi-doc: : Update ACPI documentation for Arm systems Documentation/arm64: Update ACPI tables from BBR Documentation/arm64: Update references in arm-acpi Documentation/arm64: Update ARM and arch reference * for-next/doc: : arm64 documentation updates Documentation/arm64: Add ptdump documentation * for-next/tpidr2-fix: : Fix the TPIDR2_EL0 register restoring on sigreturn kselftest/arm64: Add a test case for TPIDR2 restore arm64/signal: Restore TPIDR2 register rather than memory state
2023-06-06arm64: lockdep: enable checks for held locks when returning to userspaceEric Chan
Currently arm64 doesn't use CONFIG_GENERIC_ENTRY and doesn't call lockdep_sys_exit() when returning to userspace. This means that lockdep won't check for held locks when returning to userspace, which would be useful to detect kernel bugs. Call lockdep_sys_exit() when returning to userspace, enabling checking for held locks. At the same time, rename arm64's prepare_exit_to_user_mode() to exit_to_user_mode_prepare() to more clearly align with the naming in the generic entry code. Signed-off-by: Eric Chan <ericchancf@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531090909.357047-1-ericchancf@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-05arm64: mops: handle MOPS exceptionsKristina Martsenko
The memory copy/set instructions added as part of FEAT_MOPS can take an exception (e.g. page fault) part-way through their execution and resume execution afterwards. If however the task is re-scheduled and execution resumes on a different CPU, then the CPU may take a new type of exception to indicate this. This is because the architecture allows two options (Option A and Option B) to implement the instructions and a heterogeneous system can have different implementations between CPUs. In this case the OS has to reset the registers and restart execution from the prologue instruction. The algorithm for doing this is provided as part of the Arm ARM. Add an exception handler for the new exception and wire it up for userspace tasks. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20230509142235.3284028-8-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-04-14arm64/cpu: Mark cpu_park_loop() and friends __noreturnJosh Poimboeuf
In preparation for marking panic_smp_self_stop() __noreturn across the kernel, first mark the arm64 implementation of cpu_park_loop() and related functions __noreturn. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/55787d3193ea3e295ccbb097abfab0a10ae49d45.1681342859.git.jpoimboe@kernel.org
2022-12-06Merge branch 'for-next/undef-traps' into for-next/coreWill Deacon
* for-next/undef-traps: arm64: armv8_deprecated: fix unused-function error arm64: armv8_deprecated: rework deprected instruction handling arm64: armv8_deprecated: move aarch32 helper earlier arm64: armv8_deprecated move emulation functions arm64: armv8_deprecated: fold ops into insn_emulation arm64: rework EL0 MRS emulation arm64: factor insn read out of call_undef_hook() arm64: factor out EL1 SSBS emulation hook arm64: split EL0/EL1 UNDEF handlers arm64: allow kprobes on EL0 handlers
2022-11-15arm64: split EL0/EL1 UNDEF handlersMark Rutland
In general, exceptions taken from EL1 need to be handled separately from exceptions taken from EL0, as the logic to handle the two cases can be significantly divergent, and exceptions taken from EL1 typically have more stringent requirements on locking and instrumentation. Subsequent patches will rework the way EL1 UNDEFs are handled in order to address longstanding soundness issues with instrumentation and RCU. In preparation for that rework, this patch splits the existing do_undefinstr() handler into separate do_el0_undef() and do_el1_undef() handlers. Prior to this patch, do_undefinstr() was marked with NOKPROBE_SYMBOL(), preventing instrumentation via kprobes. However, do_undefinstr() invokes other code which can be instrumented, and: * For UNDEFINED exceptions taken from EL0, there is no risk of recursion within kprobes. Therefore it is safe for do_el0_undef to be instrumented with kprobes, and it does not need to be marked with NOKPROBE_SYMBOL(). * For UNDEFINED exceptions taken from EL1, either: (a) The exception is has been taken when manipulating SSBS; these cases are limited and do not occur within code that can be invoked recursively via kprobes. Hence, in these cases instrumentation with kprobes is benign. (b) The exception has been taken for an unknown reason, as other than manipulating SSBS we do not expect to take UNDEFINED exceptions from EL1. Any handling of these exception is best-effort. ... and in either case, marking do_el1_undef() with NOKPROBE_SYMBOL() isn't sufficient to prevent recursion via kprobes as functions it calls (including die()) are instrumentable via kprobes. Hence, it's not worthwhile to mark do_el1_undef() with NOKPROBE_SYMBOL(). The same applies to do_el1_bti() and do_el1_fpac(), so their NOKPROBE_SYMBOL() annotations are also removed. Aside from the new instrumentability, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20221019144123.612388-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-15arm64: allow kprobes on EL0 handlersMark Rutland
Currently do_sysinstr() and do_cp15instr() are marked with NOKPROBE_SYMBOL(). However, these are only called for exceptions taken from EL0, and there is no risk of recursion in kprobes, so this is not necessary. Remove the NOKPROBE_SYMBOL() annotation, and rename the two functions to more clearly indicate that these are solely for exceptions taken from EL0, better matching the names used by the lower level entry points in entry-common.c. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20221019144123.612388-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08arm64: entry: Fix typoMukesh Ojha
Fix the following typo in entry-common.c intrumentable => instrumentable Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/1667027268-1255-1-git-send-email-quic_mojha@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-01arm64: entry: avoid kprobe recursionMark Rutland
The cortex_a76_erratum_1463225_debug_handler() function is called when handling debug exceptions (and synchronous exceptions from BRK instructions), and so is called when a probed function executes. If the compiler does not inline cortex_a76_erratum_1463225_debug_handler(), it can be probed. If cortex_a76_erratum_1463225_debug_handler() is probed, any debug exception or software breakpoint exception will result in recursive exceptions leading to a stack overflow. This can be triggered with the ftrace multiple_probes selftest, and as per the example splat below. This is a regression caused by commit: 6459b8469753e9fe ("arm64: entry: consolidate Cortex-A76 erratum 1463225 workaround") ... which removed the NOKPROBE_SYMBOL() annotation associated with the function. My intent was that cortex_a76_erratum_1463225_debug_handler() would be inlined into its caller, el1_dbg(), which is marked noinstr and cannot be probed. Mark cortex_a76_erratum_1463225_debug_handler() as __always_inline to ensure this. Example splat prior to this patch (with recursive entries elided): | # echo p cortex_a76_erratum_1463225_debug_handler > /sys/kernel/debug/tracing/kprobe_events | # echo p do_el0_svc >> /sys/kernel/debug/tracing/kprobe_events | # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable | Insufficient stack space to handle exception! | ESR: 0x0000000096000047 -- DABT (current EL) | FAR: 0xffff800009cefff0 | Task stack: [0xffff800009cf0000..0xffff800009cf4000] | IRQ stack: [0xffff800008000000..0xffff800008004000] | Overflow stack: [0xffff00007fbc00f0..0xffff00007fbc10f0] | CPU: 0 PID: 145 Comm: sh Not tainted 6.0.0 #2 | Hardware name: linux,dummy-virt (DT) | pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : arm64_enter_el1_dbg+0x4/0x20 | lr : el1_dbg+0x24/0x5c | sp : ffff800009cf0000 | x29: ffff800009cf0000 x28: ffff000002c74740 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | x23: 00000000604003c5 x22: ffff80000801745c x21: 0000aaaac95ac068 | x20: 00000000f2000004 x19: ffff800009cf0040 x18: 0000000000000000 | x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 | x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 | x11: 0000000000000010 x10: ffff800008c87190 x9 : ffff800008ca00d0 | x8 : 000000000000003c x7 : 0000000000000000 x6 : 0000000000000000 | x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000000043a4 | x2 : 00000000f2000004 x1 : 00000000f2000004 x0 : ffff800009cf0040 | Kernel panic - not syncing: kernel stack overflow | CPU: 0 PID: 145 Comm: sh Not tainted 6.0.0 #2 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0xe4/0x104 | show_stack+0x18/0x4c | dump_stack_lvl+0x64/0x7c | dump_stack+0x18/0x38 | panic+0x14c/0x338 | test_taint+0x0/0x2c | panic_bad_stack+0x104/0x118 | handle_bad_stack+0x34/0x48 | __bad_stack+0x78/0x7c | arm64_enter_el1_dbg+0x4/0x20 | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | cortex_a76_erratum_1463225_debug_handler+0x0/0x34 ... | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | cortex_a76_erratum_1463225_debug_handler+0x0/0x34 ... | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | cortex_a76_erratum_1463225_debug_handler+0x0/0x34 | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | do_el0_svc+0x0/0x28 | el0t_64_sync_handler+0x84/0xf0 | el0t_64_sync+0x18c/0x190 | Kernel Offset: disabled | CPU features: 0x0080,00005021,19001080 | Memory Limit: none | ---[ end Kernel panic - not syncing: kernel stack overflow ]--- With this patch, cortex_a76_erratum_1463225_debug_handler() is inlined into el1_dbg(), and el1_dbg() cannot be probed: | # echo p cortex_a76_erratum_1463225_debug_handler > /sys/kernel/debug/tracing/kprobe_events | sh: write error: No such file or directory | # grep -w cortex_a76_erratum_1463225_debug_handler /proc/kallsyms | wc -l | 0 | # echo p el1_dbg > /sys/kernel/debug/tracing/kprobe_events | sh: write error: Invalid argument | # grep -w el1_dbg /proc/kallsyms | wc -l | 1 Fixes: 6459b8469753 ("arm64: entry: consolidate Cortex-A76 erratum 1463225 workaround") Cc: <stable@vger.kernel.org> # 5.12.x Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20221017090157.2881408-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16arm64: rework BTI exception handlingMark Rutland
If a BTI exception is taken from EL1, the entry code will treat this as an unhandled exception and will panic() the kernel. This is inconsistent with the way we handle FPAC exceptions, which have a dedicated handler and only necessarily kill the thread from which the exception was taken from, and we don't log all the information that could be relevant to debug the issue. The code in do_bti() has: BUG_ON(!user_mode(regs)); ... and it seems like the intent was to call this for EL1 BTI exceptions, as with FPAC, but this was omitted due to an oversight. This patch adds separate EL0 and EL1 BTI exception handlers, with the latter calling die() directly to report the original context the BTI exception was taken from. This matches our handling of FPAC exceptions. Prior to this patch, a BTI failure is reported as: | Unhandled 64-bit el1h sync exception on CPU0, ESR 0x0000000034000002 -- BTI | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc3-00131-g7d937ff0221d-dirty #9 | Hardware name: linux,dummy-virt (DT) | pstate: 20400809 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=-c) | pc : test_bti_callee+0x4/0x10 | lr : test_bti_caller+0x1c/0x28 | sp : ffff80000800bdf0 | x29: ffff80000800bdf0 x28: 0000000000000000 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | x23: ffff80000a2b8000 x22: 0000000000000000 x21: 0000000000000000 | x20: ffff8000099fa5b0 x19: ffff800009ff7000 x18: fffffbfffda37000 | x17: 3120676e696d7573 x16: 7361202c6e6f6974 x15: 0000000041a90000 | x14: 0040000000000041 x13: 0040000000000001 x12: ffff000001a90000 | x11: fffffbfffda37480 x10: 0068000000000703 x9 : 0001000040000000 | x8 : 0000000000090000 x7 : 0068000000000f03 x6 : 0060000000000f83 | x5 : ffff80000a2b6000 x4 : ffff0000028d0000 x3 : ffff800009f78378 | x2 : 0000000000000000 x1 : 0000000040210000 x0 : ffff8000080257e4 | Kernel panic - not syncing: Unhandled exception | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc3-00131-g7d937ff0221d-dirty #9 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace.part.0+0xcc/0xe0 | show_stack+0x18/0x5c | dump_stack_lvl+0x64/0x80 | dump_stack+0x18/0x34 | panic+0x170/0x360 | arm64_exit_nmi.isra.0+0x0/0x80 | el1h_64_sync_handler+0x64/0xd0 | el1h_64_sync+0x64/0x68 | test_bti_callee+0x4/0x10 | smp_cpus_done+0xb0/0xbc | smp_init+0x7c/0x8c | kernel_init_freeable+0x128/0x28c | kernel_init+0x28/0x13c | ret_from_fork+0x10/0x20 With this patch applied, a BTI failure is reported as: | Internal error: Oops - BTI: 0000000034000002 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc3-00132-g0ad98265d582-dirty #8 | Hardware name: linux,dummy-virt (DT) | pstate: 20400809 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=-c) | pc : test_bti_callee+0x4/0x10 | lr : test_bti_caller+0x1c/0x28 | sp : ffff80000800bdf0 | x29: ffff80000800bdf0 x28: 0000000000000000 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | x23: ffff80000a2b8000 x22: 0000000000000000 x21: 0000000000000000 | x20: ffff8000099fa5b0 x19: ffff800009ff7000 x18: fffffbfffda37000 | x17: 3120676e696d7573 x16: 7361202c6e6f6974 x15: 0000000041a90000 | x14: 0040000000000041 x13: 0040000000000001 x12: ffff000001a90000 | x11: fffffbfffda37480 x10: 0068000000000703 x9 : 0001000040000000 | x8 : 0000000000090000 x7 : 0068000000000f03 x6 : 0060000000000f83 | x5 : ffff80000a2b6000 x4 : ffff0000028d0000 x3 : ffff800009f78378 | x2 : 0000000000000000 x1 : 0000000040210000 x0 : ffff800008025804 | Call trace: | test_bti_callee+0x4/0x10 | smp_cpus_done+0xb0/0xbc | smp_init+0x7c/0x8c | kernel_init_freeable+0x128/0x28c | kernel_init+0x28/0x13c | ret_from_fork+0x10/0x20 | Code: d50323bf d53cd040 d65f03c0 d503233f (d50323bf) Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220913101732.3925290-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16arm64: rework FPAC exception handlingMark Rutland
If an FPAC exception is taken from EL1, the entry code will call do_ptrauth_fault(), where due to: BUG_ON(!user_mode(regs)) ... the kernel will report a problem within do_ptrauth_fault() rather than reporting the original context the FPAC exception was taken from. The pt_regs and ESR value reported will be from within do_ptrauth_fault() and the code dump will be for the BRK in BUG_ON(), which isn't sufficient to debug the cause of the original exception. This patch makes the reporting better by having separate EL0 and EL1 FPAC exception handlers, with the latter calling die() directly to report the original context the FPAC exception was taken from. Note that we only need to prevent kprobes of the EL1 FPAC handler, since the EL0 FPAC handler cannot be called recursively. For consistency with do_el0_svc*(), I've named the split functions do_el{0,1}_fpac() rather than do_el{0,1}_ptrauth_fault(). I've also clarified the comment to not imply there are casues other than FPAC exceptions. Prior to this patch FPAC exceptions are reported as: | kernel BUG at arch/arm64/kernel/traps.c:517! | Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc3-00130-g9c8a180a1cdf-dirty #12 | Hardware name: FVP Base RevC (DT) | pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : do_ptrauth_fault+0x3c/0x40 | lr : el1_fpac+0x34/0x54 | sp : ffff80000a3bbc80 | x29: ffff80000a3bbc80 x28: ffff0008001d8000 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | x23: 0000000020400009 x22: ffff800008f70fa4 x21: ffff80000a3bbe00 | x20: 0000000072000000 x19: ffff80000a3bbcb0 x18: fffffbfffda37000 | x17: 3120676e696d7573 x16: 7361202c6e6f6974 x15: 0000000081a90000 | x14: 0040000000000041 x13: 0040000000000001 x12: ffff000001a90000 | x11: fffffbfffda37480 x10: 0068000000000703 x9 : 0001000080000000 | x8 : 0000000000090000 x7 : 0068000000000f03 x6 : 0060000000000783 | x5 : ffff80000a3bbcb0 x4 : ffff0008001d8000 x3 : 0000000072000000 | x2 : 0000000000000000 x1 : 0000000020400009 x0 : ffff80000a3bbcb0 | Call trace: | do_ptrauth_fault+0x3c/0x40 | el1h_64_sync_handler+0xc4/0xd0 | el1h_64_sync+0x64/0x68 | test_pac+0x8/0x10 | smp_init+0x7c/0x8c | kernel_init_freeable+0x128/0x28c | kernel_init+0x28/0x13c | ret_from_fork+0x10/0x20 | Code: 97fffe5e a8c17bfd d50323bf d65f03c0 (d4210000) With this patch applied FPAC exceptions are reported as: | Internal error: Oops - FPAC: 0000000072000000 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc3-00132-g78846e1c4757-dirty #11 | Hardware name: FVP Base RevC (DT) | pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : test_pac+0x8/0x10 | lr : 0x0 | sp : ffff80000a3bbe00 | x29: ffff80000a3bbe00 x28: 0000000000000000 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | x23: ffff80000a2c8000 x22: 0000000000000000 x21: 0000000000000000 | x20: ffff8000099fa5b0 x19: ffff80000a007000 x18: fffffbfffda37000 | x17: 3120676e696d7573 x16: 7361202c6e6f6974 x15: 0000000081a90000 | x14: 0040000000000041 x13: 0040000000000001 x12: ffff000001a90000 | x11: fffffbfffda37480 x10: 0068000000000703 x9 : 0001000080000000 | x8 : 0000000000090000 x7 : 0068000000000f03 x6 : 0060000000000783 | x5 : ffff80000a2c6000 x4 : ffff0008001d8000 x3 : ffff800009f88378 | x2 : 0000000000000000 x1 : 0000000080210000 x0 : ffff000001a90000 | Call trace: | test_pac+0x8/0x10 | smp_init+0x7c/0x8c | kernel_init_freeable+0x128/0x28c | kernel_init+0x28/0x13c | ret_from_fork+0x10/0x20 | Code: d50323bf d65f03c0 d503233f aa1f03fe (d50323bf) Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220913101732.3925290-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16arm64: consistently pass ESR_ELx to die()Mark Rutland
Currently, bug_handler() and kasan_handler() call die() with '0' as the 'err' value, whereas die_kernel_fault() passes the ESR_ELx value. For consistency, this patch ensures we always pass the ESR_ELx value to die(). As this is only called for exceptions taken from kernel mode, there should be no user-visible change as a result of this patch. For UNDEFINED exceptions, I've had to modify do_undefinstr() and its callers to pass the ESR_ELx value. In all cases the ESR_ELx value had already been read and was available. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220913101732.3925290-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-07-05context_tracking: Take NMI eqs entrypoints over RCUFrederic Weisbecker
The RCU dynticks counter is going to be merged into the context tracking subsystem. Prepare with moving the NMI extended quiescent states entrypoints to context tracking. For now those are dumb redirection to existing RCU calls. Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05context_tracking: Take IRQ eqs entrypoints over RCUFrederic Weisbecker
The RCU dynticks counter is going to be merged into the context tracking subsystem. Prepare with moving the IRQ extended quiescent states entrypoints to context tracking. For now those are dumb redirection to existing RCU calls. [ paulmck: Apply Stephen Rothwell feedback from -next. ] [ paulmck: Apply Nathan Chancellor feedback. ] Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-05-24Merge tag 'locking-core-2022-05-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - rwsem cleanups & optimizations/fixes: - Conditionally wake waiters in reader/writer slowpaths - Always try to wake waiters in out_nolock path - Add try_cmpxchg64() implementation, with arch optimizations - and use it to micro-optimize sched_clock_{local,remote}() - Various force-inlining fixes to address objdump instrumentation-check warnings - Add lock contention tracepoints: lock:contention_begin lock:contention_end - Misc smaller fixes & cleanups * tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote} locking/atomic/x86: Introduce arch_try_cmpxchg64 locking/atomic: Add generic try_cmpxchg64 support futex: Remove a PREEMPT_RT_FULL reference. locking/qrwlock: Change "queue rwlock" to "queued rwlock" lockdep: Delete local_irq_enable_in_hardirq() locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning locking: Apply contention tracepoints in the slow path locking: Add lock contention tracepoints locking/rwsem: Always try to wake waiters in out_nolock path locking/rwsem: Conditionally wake waiters in reader/writer slowpaths locking/rwsem: No need to check for handoff bit if wait queue empty lockdep: Fix -Wunused-parameter for _THIS_IP_ x86/mm: Force-inline __phys_addr_nodebug() x86/kvm/svm: Force-inline GHCB accessors task_stack, x86/cea: Force-inline stack helpers
2022-05-20Merge branch 'for-next/esr-elx-64-bit' into for-next/coreCatalin Marinas
* for-next/esr-elx-64-bit: : Treat ESR_ELx as a 64-bit register. KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high KVM: arm64: Treat ESR_EL2 as a 64-bit register arm64: Treat ESR_ELx as a 64-bit register arm64: compat: Do not treat syscall number as ESR_ELx for a bad syscall arm64: Make ESR_ELx_xVC_IMM_MASK compatible with assembly
2022-04-29arm64: Treat ESR_ELx as a 64-bit registerAlexandru Elisei
In the initial release of the ARM Architecture Reference Manual for ARMv8-A, the ESR_ELx registers were defined as 32-bit registers. This changed in 2018 with version D.a (ARM DDI 0487D.a) of the architecture, when they became 64-bit registers, with bits [63:32] defined as RES0. In version G.a, a new field was added to ESR_ELx, ISS2, which covers bits [36:32]. This field is used when the Armv8.7 extension FEAT_LS64 is implemented. As a result of the evolution of the register width, Linux stores it as both a 64-bit value and a 32-bit value, which hasn't affected correctness so far as Linux only uses the lower 32 bits of the register. Make the register type consistent and always treat it as 64-bit wide. The register is redefined as an "unsigned long", which is an unsigned double-word (64-bit quantity) for the LP64 machine (aapcs64 [1], Table 1, page 14). The type was chosen because "unsigned int" is the most frequent type for ESR_ELx and because FAR_ELx, which is used together with ESR_ELx in exception handling, is also declared as "unsigned long". The 64-bit type also makes adding support for architectural features that use fields above bit 31 easier in the future. The KVM hypervisor will receive a similar update in a subsequent patch. [1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220425114444.368693-4-alexandru.elisei@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22arm64/sme: Implement traps and syscall handling for SMEMark Brown
By default all SME operations in userspace will trap. When this happens we allocate storage space for the SME register state, set up the SVE registers and disable traps. We do not need to initialize ZA since the architecture guarantees that it will be zeroed when enabled and when we trap ZA is disabled. On syscall we exit streaming mode if we were previously in it and ensure that all but the lower 128 bits of the registers are zeroed while preserving the state of ZA. This follows the aarch64 PCS for SME, ZA state is preserved over a function call and streaming mode is exited. Since the traps for SME do not distinguish between streaming mode SVE and ZA usage if ZA is in use rather than reenabling traps we instead zero the parts of the SVE registers not shared with FPSIMD and leave SME enabled, this simplifies handling SME traps. If ZA is not in use then we reenable SME traps and fall through to normal handling of SVE. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220419112247.711548-17-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-05lockdep: Fix -Wunused-parameter for _THIS_IP_Nick Desaulniers
While looking into a bug related to the compiler's handling of addresses of labels, I noticed some uses of _THIS_IP_ seemed unused in lockdep. Drive by cleanup. -Wunused-parameter: kernel/locking/lockdep.c:1383:22: warning: unused parameter 'ip' kernel/locking/lockdep.c:4246:48: warning: unused parameter 'ip' kernel/locking/lockdep.c:4844:19: warning: unused parameter 'ip' Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20220314221909.2027027-1-ndesaulniers@google.com
2022-03-22Merge tag 'sched-core-2022-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Cleanups for SCHED_DEADLINE - Tracing updates/fixes - CPU Accounting fixes - First wave of changes to optimize the overhead of the scheduler build, from the fast-headers tree - including placeholder *_api.h headers for later header split-ups. - Preempt-dynamic using static_branch() for ARM64 - Isolation housekeeping mask rework; preperatory for further changes - NUMA-balancing: deal with CPU-less nodes - NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD) - Updates to RSEQ UAPI in preparation for glibc usage - Lots of RSEQ/selftests, for same - Add Suren as PSI co-maintainer * tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits) sched/headers: ARM needs asm/paravirt_api_clock.h too sched/numa: Fix boot crash on arm64 systems headers/prep: Fix header to build standalone: <linux/psi.h> sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y cgroup: Fix suspicious rcu_dereference_check() usage warning sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity() sched/deadline,rt: Remove unused functions for !CONFIG_SMP sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy() sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file sched/deadline: Remove unused def_dl_bandwidth sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE sched/tracing: Don't re-read p->state when emitting sched_switch event sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race sched/cpuacct: Remove redundant RCU read lock sched/cpuacct: Optimize away RCU read lock sched/cpuacct: Fix charge percpu cpuusage sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies ...
2022-02-22arm64: mte: avoid clearing PSTATE.TCO on entry unless necessaryPeter Collingbourne
On some microarchitectures, clearing PSTATE.TCO is expensive. Clearing TCO is only necessary if in-kernel MTE is enabled, or if MTE is enabled in the userspace process in synchronous (or, soon, asymmetric) mode, because we do not report uaccess faults to userspace in none or asynchronous modes. Therefore, adjust the kernel entry code to clear TCO only if necessary. Because it is now possible to switch to a task in which TCO needs to be clear from a task in which TCO is set, we also need to do the same thing on task switch. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/I52d82a580bd0500d420be501af2c35fa8c90729e Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220219012945.894950-2-pcc@google.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-19arm64: Support PREEMPT_DYNAMICMark Rutland
This patch enables support for PREEMPT_DYNAMIC on arm64, allowing the preemption model to be chosen at boot time. Specifically, this patch selects HAVE_PREEMPT_DYNAMIC_KEY, so that each preemption function is an out-of-line call with an early return depending upon a static key. This leaves almost all the codegen up to the compiler, and side-steps a number of pain points with static calls (e.g. interaction with CFI schemes). This should have no worse overhead than using non-inline static calls, as those use out-of-line trampolines with early returns. For example, the dynamic_cond_resched() wrapper looks as follows when enabled. When disabled, the first `B` is replaced with a `NOP`, resulting in an early return. | <dynamic_cond_resched>: | bti c | b <dynamic_cond_resched+0x10> // or `nop` | mov w0, #0x0 | ret | mrs x0, sp_el0 | ldr x0, [x0, #8] | cbnz x0, <dynamic_cond_resched+0x8> | paciasp | stp x29, x30, [sp, #-16]! | mov x29, sp | bl <preempt_schedule_common> | mov w0, #0x1 | ldp x29, x30, [sp], #16 | autiasp | ret ... compared to the regular form of the function: | <__cond_resched>: | bti c | mrs x0, sp_el0 | ldr x1, [x0, #8] | cbz x1, <__cond_resched+0x18> | mov w0, #0x0 | ret | paciasp | stp x29, x30, [sp, #-16]! | mov x29, sp | bl <preempt_schedule_common> | mov w0, #0x1 | ldp x29, x30, [sp], #16 | autiasp | ret Since arm64 does not yet use the generic entry code, we must define our own `sk_dynamic_irqentry_exit_cond_resched`, which will be enabled/disabled by the common code in kernel/sched/core.c. All other preemption functions and associated static keys are defined there. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20220214165216.2231574-8-mark.rutland@arm.com
2022-02-19arm64: entry: Centralize preemption decisionMark Rutland
For historical reasons, the decision of whether or not to preempt is spread across arm64_preempt_schedule_irq() and __el1_irq(), and it would be clearer if this were all in one place. Also, arm64_preempt_schedule_irq() calls lockdep_assert_irqs_disabled(), but this is redundant, as we have a subsequent identical assertion in __exit_to_kernel_mode(), and preempt_schedule_irq() will BUG_ON(!irqs_disabled()) anyway. This patch removes the redundant assertion and centralizes the preemption decision making within arm64_preempt_schedule_irq(). Other than the slight change to assertion behaviour, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20220214165216.2231574-7-mark.rutland@arm.com
2021-12-01arm64: Snapshot thread flagsMark Rutland
Some thread flags can be set remotely, and so even when IRQs are disabled, the flags can change under our feet. Generally this is unlikely to cause a problem in practice, but it is somewhat unsound, and KCSAN will legitimately warn that there is a data race. To avoid such issues, a snapshot of the flags has to be taken prior to using them. Some places already use READ_ONCE() for that, others do not. Convert them all to the new flag accessor helpers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20211129130653.2037928-7-mark.rutland@arm.com
2021-10-26irq: arm64: perform irqentry in entry codeMark Rutland
In preparation for removing HANDLE_DOMAIN_IRQ_IRQENTRY, have arch/arm64 perform all the irqentry accounting in its entry code. As arch/arm64 already performs portions of the irqentry logic in enter_from_kernel_mode() and exit_to_kernel_mode(), including rcu_irq_{enter,exit}(), the only additional calls that need to be made are to irq_{enter,exit}_rcu(). Removing the calls to rcu_irq_{enter,exit}() from handle_domain_irq() ensures that we inform RCU once per IRQ entry and will correctly identify quiescent periods. Since we should not call irq_{enter,exit}_rcu() when entering a pseudo-NMI, el1_interrupt() is reworked to have separate __el1_irq() and __el1_pnmi() paths for regular IRQ and psuedo-NMI entry, with irq_{enter,exit}_irq() only called for the former. In preparation for removing HANDLE_DOMAIN_IRQ, the irq regs are managed in do_interrupt_handler() for both regular IRQ and pseudo-NMI. This is currently redundant, but not harmful. For clarity the preemption logic is moved into __el1_irq(). We should never preempt within a pseudo-NMI, and arm64_enter_nmi() already enforces this by incrementing the preempt_count, but it's clearer if we never invoke the preemption logic when entering a pseudo-NMI. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Pingfan Liu <kernelfans@gmail.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org>
2021-08-05arm64: entry: call exit_to_user_mode() from CMark Rutland
When handling an exception from EL0, we perform the entry work in that exception's C handler, and once the C handler has finished, we return back to the entry assembly. Subsequently in the common `ret_to_user` assembly we perform the exit work that balances with the entry work. This can be somewhat difficult to follow, and makes it hard to rework the return paths (e.g. to pass additional context to the exit code, or to have exception return logic for specific exceptions). This patch reworks the entry code such that each EL0 C exception handler is responsible for both the entry and exit work. This clearly balances the two (and will permit additional variation in future), and avoids an unnecessary bounce between assembly and C in the common case, leaving `ret_from_fork` as the only place assembly has to call the exit code. This means that the exit work is now inlined into the C handler, which is already the case for the entry work, and allows the compiler to generate better code (e.g. by immediately returning when there is no exit work to perform). To align with other exception entry/exit helpers, enter_from_user_mode() is updated to take the EL0 pt_regs as a parameter, though this is currently unused. There should be no functional change as a result of this patch. However, this should lead to slightly better backtraces when an error is encountered within do_notify_resume(), as the C handler should appear in the backtrace, indicating the specific exception that the kernel was entered with. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20210802140733.52716-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-05arm64: entry: move bulk of ret_to_user to CMark Rutland
In `ret_to_user` we perform some conditional work depending on the thread flags, then perform some IRQ/context tracking which is intended to balance with the IRQ/context tracking performed in the entry C code. For simplicity and consistency, it would be preferable to move this all to C. As a step towards that, this patch moves the conditional work and IRQ/context tracking into a C helper function. To aid bisectability, this is called from the `ret_to_user` assembly, and a subsequent patch will move the call to C code. As local_daif_mask() handles all necessary tracing and PMR manipulation, we no longer need to handle this explicitly. As we call exit_to_user_mode() directly, the `user_enter_irqoff` macro is no longer used, and can be removed. As enter_from_user_mode() and exit_to_user_mode() are no longer called from assembly, these can be made static, and as these are typically very small, they are marked __always_inline to avoid the overhead of a function call. For now, enablement of single-step is left in entry.S, and for this we still need to read the flags in ret_to_user(). It is safe to read this separately as TIF_SINGLESTEP is not part of _TIF_WORK_MASK. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20210802140733.52716-4-mark.rutland@arm.com [catalin.marinas@arm.com: removed unused gic_prio_kentry_setup macro] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-05arm64: entry: clarify entry/exit helpersMark Rutland
When entering an exception, we must perform irq/context state management before we can use instrumentable C code. Similarly, when exiting an exception we cannot use instrumentable C code after we perform irq/context state management. Originally, we'd intended that the enter_from_*() and exit_to_*() helpers would enforce this by virtue of being the first and last functions called, respectively, in an exception handler. However, as they now call instrumentable code themselves, this is not as clearly true. To make this more robust, this patch splits the irq/context state management into separate helpers, with all the helpers commented to make their intended purpose more obvious. In exit_to_kernel_mode() we'll now check TFSR_EL1 before we assert that IRQs are disabled, but this ordering is not important, and other than this there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20210802140733.52716-3-mark.rutland@arm.com [catalin.marinas@arm.com: comment typos fix-up] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-05arm64: entry: consolidate entry/exit helpersMark Rutland
To make the various entry/exit helpers easier to understand and easier to compare, this patch moves all the entry/exit helpers to be adjacent at the top of entry-common.c, rather than being spread out throughout the file. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20210802140733.52716-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>