Age | Commit message (Collapse) | Author |
|
'for-next/misc', 'for-next/acpi', 'for-next/debug-entry', 'for-next/feat_mte_tagged_far', 'for-next/kselftest', 'for-next/mdscr-cleanup' and 'for-next/vmap-stack', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (23 commits)
drivers/perf: hisi: Support PMUs with no interrupt
drivers/perf: hisi: Relax the event number check of v2 PMUs
drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver
drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information
drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver
drivers/perf: hisi: Simplify the probe process for each DDRC version
perf/arm-ni: Support sharing IRQs within an NI instance
perf/arm-ni: Consolidate CPU affinity handling
perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentation
perf/cxlpmu: Remove unintended newline from IRQ name format string
perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe()
perf: arm_spe: Relax period restriction
perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
KVM: arm64: nvhe: Disable branch generation in nVHE guests
arm64: Handle BRBE booting requirements
arm64/sysreg: Add BRBE registers and fields
perf/arm: Add missing .suppress_bind_attrs
perf/arm-cmn: Reduce stack usage during discovery
perf: imx9_perf: make the read-only array mask static const
perf/arm-cmn: Broaden module description for wider interconnect support
...
* for-next/livepatch:
: Support for HAVE_LIVEPATCH on arm64
arm64: Kconfig: Keep selects somewhat alphabetically ordered
arm64: Implement HAVE_LIVEPATCH
arm64: stacktrace: Implement arch_stack_walk_reliable()
arm64: stacktrace: Check kretprobe_find_ret_addr() return value
arm64/module: Use text-poke API for late relocations.
* for-next/user-contig-bbml2:
: Optimise the TLBI when folding/unfolding contigous PTEs on hardware with BBML2 and no TLB conflict aborts
arm64/mm: Elide tlbi in contpte_convert() under BBML2
iommu/arm: Add BBM Level 2 smmu feature
arm64: Add BBM Level 2 cpu feature
arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type
* for-next/misc:
: Miscellaneous arm64 patches
arm64/gcs: task_gcs_el0_enable() should use passed task
arm64: signal: Remove ISB when resetting POR_EL0
arm64/mm: Drop redundant addr increment in set_huge_pte_at()
arm64: Mark kernel as tainted on SAE and SError panic
arm64/gcs: Don't call gcs_free() when releasing task_struct
arm64: fix unnecessary rebuilding when CONFIG_DEBUG_EFI=y
arm64/mm: Optimize loop to reduce redundant operations of contpte_ptep_get
arm64: pi: use 'targets' instead of extra-y in Makefile
* for-next/acpi:
: Various ACPI arm64 changes
ACPI: Suppress misleading SPCR console message when SPCR table is absent
ACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabled
* for-next/debug-entry:
: Simplify the debug exception entry path
arm64: debug: remove debug exception registration infrastructure
arm64: debug: split bkpt32 exception entry
arm64: debug: split brk64 exception entry
arm64: debug: split hardware watchpoint exception entry
arm64: debug: split single stepping exception entry
arm64: debug: refactor reinstall_suspended_bps()
arm64: debug: split hardware breakpoint exception entry
arm64: entry: Add entry and exit functions for debug exceptions
arm64: debug: remove break/step handler registration infrastructure
arm64: debug: call step handlers statically
arm64: debug: call software breakpoint handlers statically
arm64: refactor aarch32_break_handler()
arm64: debug: clean up single_step_handler logic
* for-next/feat_mte_tagged_far:
: Support for reporting the non-address bits during a synchronous MTE tag check fault
kselftest/arm64/mte: Add mtefar tests on check_mmap_options
kselftest/arm64/mte: Refactor check_mmap_option test
kselftest/arm64/mte: Add verification for address tag in signal handler
kselftest/arm64/mte: Add address tag related macro and function
kselftest/arm64/mte: Check MTE_FAR feature is supported
kselftest/arm64/mte: Register mte signal handler with SA_EXPOSE_TAGBITS
kselftest/arm64: Add MTE_FAR hwcap test
KVM: arm64: Expose FEAT_MTE_TAGGED_FAR feature to guest
arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported
arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR feature
* for-next/kselftest:
: Kselftest updates for arm64
kselftest/arm64: Handle attempts to disable SM on SME only systems
kselftest/arm64: Fix SVE write data generation for SME only systems
kselftest/arm64: Test SME on SME only systems in fp-ptrace
kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace
kselftest/arm64: Allow sve-ptrace to run on SME only systems
kselftest/arm4: Provide local defines for AT_HWCAP3
kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace
kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace
kselftest/arm64: Fix check for setting new VLs in sve-ptrace
kselftest/arm64: Convert tpidr2 test to use kselftest.h
* for-next/mdscr-cleanup:
: Drop redundant DBG_MDSCR_* macros
KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t
arm64/debug: Drop redundant DBG_MDSCR_* macros
* for-next/vmap-stack:
: Force VMAP_STACK on arm64
arm64: remove CONFIG_VMAP_STACK checks from entry code
arm64: remove CONFIG_VMAP_STACK checks from SDEI stack handling
arm64: remove CONFIG_VMAP_STACK checks from stacktrace overflow logic
arm64: remove CONFIG_VMAP_STACK conditionals from traps overflow stack
arm64: remove CONFIG_VMAP_STACK conditionals from irq stack setup
arm64: Remove CONFIG_VMAP_STACK conditionals from THREAD_SHIFT and THREAD_ALIGN
arm64: efi: Remove CONFIG_VMAP_STACK check
arm64: Mandate VMAP_STACK
arm64: efi: Fix KASAN false positive for EFI runtime stack
arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth()
arm64/gcs: Don't call gcs_free() during flush_gcs()
arm64: Restrict pagetable teardown to avoid false warning
docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
|
|
With VMAP_STACK now always enabled on arm64, remove all CONFIG_VMAP_STACK
conditionals from overflow stack handling in stacktrace code.
This change unconditionally defines the per-CPU overflow_stack and
stackinfo_get_overflow() helper in arch/arm64/include/asm/stacktrace.h,
and always includes the overflow stack in the stack_info array in
arch/arm64/kernel/stacktrace.c. Also, drop redundant CONFIG_VMAP_STACK
checks from SDEI stack declarations.
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250707-arm64_vmap-v1-6-8de98ca0f91c@debian.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add arch_stack_walk_reliable(), which will be used during kernel live
patching to detect when threads have completed executing old versions of
functions.
Note that arch_stack_walk_reliable() only needs to guarantee that it
returns an error code when it cannot provide a reliable stacktrace. It
is not required to provide a reliable stacktrace in all scenarios so
long as it returns said error code.
At present we can only reliably unwind up to an exception boundary. In
future we should be able to improve this with additional data from the
compiler (e.g. sframe).
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250320171559.3423224-2-song@kernel.org
[ Mark: Simplify logic, clarify commit message ]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrea della Porta <andrea.porta@suse.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Song Liu <song@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250521111000.2237470-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
If kretprobe_find_ret_addr() fails to find the original return address,
it returns 0. Check for this case so that a reliable stacktrace won't
silently ignore it.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrea della Porta <andrea.porta@suse.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Song Liu <song@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-and-tested-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250521111000.2237470-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The arm64 stacktrace code has a few error conditions where a
WARN_ON_ONCE() is triggered before the stacktrace is terminated and an
error is returned to the caller. The conditions shouldn't be triggered
when unwinding the current task, but it is possible to trigger these
when unwinding another task which is not blocked, as the stack of that
task is concurrently modified. Kent reports that these warnings can be
triggered while running filesystem tests on bcachefs, which calls the
stacktrace code directly.
To produce a meaningful stacktrace of another task, the task in question
should be blocked, but the stacktrace code is expected to be robust to
cases where it is not blocked. Note that this is purely about not
unuduly scaring the user and/or crashing the kernel; stacktraces in such
cases are meaningless and may leak kernel secrets from the stack of the
task being unwound.
Ideally we'd pin the task in a blocked state during the unwind, as we do
for /proc/${PID}/wchan since commit:
42a20f86dc19f928 ("sched: Add wrapper for get_wchan() to keep task blocked")
... but a bunch of places don't do that, notably /proc/${PID}/stack,
where we don't pin the task in a blocked state, but do restrict the
output to privileged users since commit:
f8a00cef17206ecd ("proc: restrict kernel stack dumps to root")
... and so it's possible to trigger these warnings accidentally, e.g. by
reading /proc/*/stack (as root):
| for n in $(seq 1 10); do
| while true; do cat /proc/*/stack > /dev/null 2>&1; done &
| done
| ------------[ cut here ]------------
| WARNING: CPU: 3 PID: 166 at arch/arm64/kernel/stacktrace.c:207 arch_stack_walk+0x1c8/0x370
| Modules linked in:
| CPU: 3 UID: 0 PID: 166 Comm: cat Not tainted 6.13.0-rc2-00003-g3dafa7a7925d #2
| Hardware name: linux,dummy-virt (DT)
| pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
| pc : arch_stack_walk+0x1c8/0x370
| lr : arch_stack_walk+0x1b0/0x370
| sp : ffff800080773890
| x29: ffff800080773930 x28: fff0000005c44500 x27: fff00000058fa038
| x26: 000000007ffff000 x25: 0000000000000000 x24: 0000000000000000
| x23: ffffa35a8d9600ec x22: 0000000000000000 x21: fff00000043a33c0
| x20: ffff800080773970 x19: ffffa35a8d960168 x18: 0000000000000000
| x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
| x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
| x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
| x8 : ffff8000807738e0 x7 : ffff8000806e3800 x6 : ffff8000806e3818
| x5 : ffff800080773920 x4 : ffff8000806e4000 x3 : ffff8000807738e0
| x2 : 0000000000000018 x1 : ffff8000806e3800 x0 : 0000000000000000
| Call trace:
| arch_stack_walk+0x1c8/0x370 (P)
| stack_trace_save_tsk+0x8c/0x108
| proc_pid_stack+0xb0/0x134
| proc_single_show+0x60/0x120
| seq_read_iter+0x104/0x438
| seq_read+0xf8/0x140
| vfs_read+0xc4/0x31c
| ksys_read+0x70/0x108
| __arm64_sys_read+0x1c/0x28
| invoke_syscall+0x48/0x104
| el0_svc_common.constprop.0+0x40/0xe0
| do_el0_svc+0x1c/0x28
| el0_svc+0x30/0xcc
| el0t_64_sync_handler+0x10c/0x138
| el0t_64_sync+0x198/0x19c
| ---[ end trace 0000000000000000 ]---
Fix this by only warning when unwinding the current task. When unwinding
another task the error conditions will be handled by returning an error
without producing a warning.
The two warnings in kunwind_next_frame_record_meta() were added recently
as part of commit:
c2c6b27b5aa14fa2 ("arm64: stacktrace: unwind exception boundaries")
The warning when recovering the fgraph return address has changed form
many times, but was originally introduced back in commit:
9f416319f40cd857 ("arm64: fix unwind_frame() for filtered out fn for function graph tracing")
Fixes: c2c6b27b5aa1 ("arm64: stacktrace: unwind exception boundaries")
Fixes: 9f416319f40c ("arm64: fix unwind_frame() for filtered out fn for function graph tracing")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241211140704.2498712-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Aishwarya reports that warnings are sometimes seen when running the
ftrace kselftests, e.g.
| WARNING: CPU: 5 PID: 2066 at arch/arm64/kernel/stacktrace.c:141 arch_stack_walk+0x4a0/0x4c0
| Modules linked in:
| CPU: 5 UID: 0 PID: 2066 Comm: ftracetest Not tainted 6.13.0-rc2 #2
| Hardware name: linux,dummy-virt (DT)
| pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : arch_stack_walk+0x4a0/0x4c0
| lr : arch_stack_walk+0x248/0x4c0
| sp : ffff800083643d20
| x29: ffff800083643dd0 x28: ffff00007b891400 x27: ffff00007b891928
| x26: 0000000000000001 x25: 00000000000000c0 x24: ffff800082f39d80
| x23: ffff80008003ee8c x22: ffff80008004baa8 x21: ffff8000800533e0
| x20: ffff800083643e10 x19: ffff80008003eec8 x18: 0000000000000000
| x17: 0000000000000000 x16: ffff800083640000 x15: 0000000000000000
| x14: 02a37a802bbb8a92 x13: 00000000000001a9 x12: 0000000000000001
| x11: ffff800082ffad60 x10: ffff800083643d20 x9 : ffff80008003eed0
| x8 : ffff80008004baa8 x7 : ffff800086f2be80 x6 : ffff0000057cf000
| x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff800086f2b690
| x2 : ffff80008004baa8 x1 : ffff80008004baa8 x0 : ffff80008004baa8
| Call trace:
| arch_stack_walk+0x4a0/0x4c0 (P)
| arch_stack_walk+0x248/0x4c0 (L)
| profile_pc+0x44/0x80
| profile_tick+0x50/0x80 (F)
| tick_nohz_handler+0xcc/0x160 (F)
| __hrtimer_run_queues+0x2ac/0x340 (F)
| hrtimer_interrupt+0xf4/0x268 (F)
| arch_timer_handler_virt+0x34/0x60 (F)
| handle_percpu_devid_irq+0x88/0x220 (F)
| generic_handle_domain_irq+0x34/0x60 (F)
| gic_handle_irq+0x54/0x140 (F)
| call_on_irq_stack+0x24/0x58 (F)
| do_interrupt_handler+0x88/0x98
| el1_interrupt+0x34/0x68 (F)
| el1h_64_irq_handler+0x18/0x28
| el1h_64_irq+0x6c/0x70
| queued_spin_lock_slowpath+0x78/0x460 (P)
The warning in question is:
WARN_ON_ONCE(state->common.pc == orig_pc))
... in kunwind_recover_return_address(), which is triggered when
return_to_handler() is encountered in the trace, but
ftrace_graph_ret_addr() cannot find a corresponding original return
address on the fgraph return stack.
This happens because the stacktrace code encounters an exception
boundary where the LR was not live at the time of the exception, but the
LR happens to contain return_to_handler(); either because the task
recently returned there, or due to unfortunate usage of the LR at a
scratch register. In such cases attempts to recover the return address
via ftrace_graph_ret_addr() may fail, triggering the WARN_ON_ONCE()
above and aborting the unwind (hence the stacktrace terminating after
reporting the PC at the time of the exception).
Handling unreliable LR values in these cases is likely to require some
larger rework, so for the moment avoid this problem by restoring the old
behaviour of skipping the LR at exception boundaries, which the
stacktrace code did prior to commit:
c2c6b27b5aa14fa2 ("arm64: stacktrace: unwind exception boundaries")
This commit is effectively a partial revert, keeping the structures and
logic to explicitly identify exception boundaries while still skipping
reporting of the LR. The logic to explicitly identify exception
boundaries is still useful for general robustness and as a building
block for future support for RELIABLE_STACKTRACE.
Fixes: c2c6b27b5aa1 ("arm64: stacktrace: unwind exception boundaries")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241211140704.2498712-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When arm64's stack unwinder encounters an exception boundary, it uses
the pt_regs::stackframe created by the entry code, which has a copy of
the PC and FP at the time the exception was taken. The unwinder doesn't
know anything about pt_regs, and reports the PC from the stackframe, but
does not report the LR.
The LR is only guaranteed to contain the return address at function call
boundaries, and can be used as a scratch register at other times, so the
LR at an exception boundary may or may not be a legitimate return
address. It would be useful to report the LR value regardless, as it can
be helpful when debugging, and in future it will be helpful for reliable
stacktrace support.
This patch changes the way we unwind across exception boundaries,
allowing both the PC and LR to be reported. The entry code creates a
frame_record_meta structure embedded within pt_regs, which the unwinder
uses to find the pt_regs. The unwinder can then extract pt_regs::pc and
pt_regs::lr as two separate unwind steps before continuing with a
regular walk of frame records.
When a PC is unwound from pt_regs::lr, dump_backtrace() will log this
with an "L" marker so that it can be identified easily. For example,
an unwind across an exception boundary will appear as follows:
| el1h_64_irq+0x6c/0x70
| _raw_spin_unlock_irqrestore+0x10/0x60 (P)
| __aarch64_insn_write+0x6c/0x90 (L)
| aarch64_insn_patch_text_nosync+0x28/0x80
... with a (P) entry for pt_regs::pc, and an (L) entry for pt_regs:lr.
Note that the LR may be stale at the point of the exception, for example,
shortly after a return:
| el1h_64_irq+0x6c/0x70
| default_idle_call+0x34/0x180 (P)
| default_idle_call+0x28/0x180 (L)
| do_idle+0x204/0x268
... where the LR points a few instructions before the current PC.
This plays nicely with all the other unwind metadata tracking. With the
ftrace_graph profiler enabled globally, and kretprobes installed on
generic_handle_domain_irq() and do_interrupt_handler(), a backtrace triggered
by magic-sysrq + L reports:
| Call trace:
| show_stack+0x20/0x40 (CF)
| dump_stack_lvl+0x60/0x80 (F)
| dump_stack+0x18/0x28
| nmi_cpu_backtrace+0xfc/0x140
| nmi_trigger_cpumask_backtrace+0x1c8/0x200
| arch_trigger_cpumask_backtrace+0x20/0x40
| sysrq_handle_showallcpus+0x24/0x38 (F)
| __handle_sysrq+0xa8/0x1b0 (F)
| handle_sysrq+0x38/0x50 (F)
| pl011_int+0x460/0x5a8 (F)
| __handle_irq_event_percpu+0x60/0x220 (F)
| handle_irq_event+0x54/0xc0 (F)
| handle_fasteoi_irq+0xa8/0x1d0 (F)
| generic_handle_domain_irq+0x34/0x58 (F)
| gic_handle_irq+0x54/0x140 (FK)
| call_on_irq_stack+0x24/0x58 (F)
| do_interrupt_handler+0x88/0xa0
| el1_interrupt+0x34/0x68 (FK)
| el1h_64_irq_handler+0x18/0x28
| el1h_64_irq+0x6c/0x70
| default_idle_call+0x34/0x180 (P)
| default_idle_call+0x28/0x180 (L)
| do_idle+0x204/0x268
| cpu_startup_entry+0x3c/0x50 (F)
| rest_init+0xe4/0xf0
| start_kernel+0x744/0x750
| __primary_switched+0x88/0x98
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-11-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When analysing a stacktrace it can be useful to know whether an unwound
PC has been rewritten by fgraph or kretprobes, as in some situations
these may be suspect or be known to be unreliable.
This patch adds flags to track when an unwind entry has recovered the PC
from fgraph and/or kretprobes, and updates dump_backtrace() to log when
this is the case.
The flags recorded are:
"F" - the PC was recovered from fgraph
"K" - the PC was recovered from kretprobes
These flags are recorded and logged in addition to the original source
of the unwound PC.
For example, with the ftrace_graph profiler enabled globally, and
kretprobes installed on generic_handle_domain_irq() and
do_interrupt_handler(), a backtrace triggered by magic-sysrq + L
reports:
| Call trace:
| show_stack+0x20/0x40 (CF)
| dump_stack_lvl+0x60/0x80 (F)
| dump_stack+0x18/0x28
| nmi_cpu_backtrace+0xfc/0x140
| nmi_trigger_cpumask_backtrace+0x1c8/0x200
| arch_trigger_cpumask_backtrace+0x20/0x40
| sysrq_handle_showallcpus+0x24/0x38 (F)
| __handle_sysrq+0xa8/0x1b0 (F)
| handle_sysrq+0x38/0x50 (F)
| pl011_int+0x460/0x5a8 (F)
| __handle_irq_event_percpu+0x60/0x220 (F)
| handle_irq_event+0x54/0xc0 (F)
| handle_fasteoi_irq+0xa8/0x1d0 (F)
| generic_handle_domain_irq+0x34/0x58 (F)
| gic_handle_irq+0x54/0x140 (FK)
| call_on_irq_stack+0x24/0x58 (F)
| do_interrupt_handler+0x88/0xa0
| el1_interrupt+0x34/0x68 (FK)
| el1h_64_irq_handler+0x18/0x28
| el1h_64_irq+0x64/0x68
| default_idle_call+0x34/0x180
| do_idle+0x204/0x268
| cpu_startup_entry+0x40/0x50 (F)
| rest_init+0xe4/0xf0
| start_kernel+0x744/0x750
| __primary_switched+0x80/0x90
Note that as these flags are reported next to the recovered PC value,
they appear on the callers of instrumented functions. For example
gic_handle_irq() has a "K" marker because generic_handle_domain_irq()
was instrumented with kretprobes and had its return address rewritten.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When analysing a stacktrace it can be useful to know where an unwound PC
came from, as in some situations certain sources may be suspect or known
to be unreliable. In future it would also be useful to track this so
that certain unwind steps can be performed in a stateful manner. For
example when unwinding across an exception boundary, we'd ideally unwind
pt_regs::pc, then pt_regs::lr, then the next frame record.
This patch adds an enumerated set of unwind sources, tracks this during
the unwind, and updates dump_backtrace() to log these for interesting
unwind steps.
The interesting sources recorded are:
"C" - the PC came from the caller of an unwind function.
"T" - the PC came from thread_saved_pc() for a blocked task.
"P" - the PC came from a pt_regs::pc.
"U" - the PC came from an unknown source (indicates an unwinder error).
... with nothing recorded when the PC came from a frame_record::pc as
this is the vastly common case and logging this would make it difficult
to spot the more interesting cases.
For example, when triggering a backtrace via magic-sysrq + L, the CPU
handling the sysrq will have a backtrace whose first element is the
caller (C) of dump_backtrace():
| Call trace:
| show_stack+0x18/0x30 (C)
| dump_stack_lvl+0x60/0x80
| dump_stack+0x18/0x24
| nmi_cpu_backtrace+0xfc/0x140
| ...
... and other CPUs will have a backtrace whose first element is their
pt_regs::pc (P) at the instant the backtrace IPI was taken:
| Call trace:
| _raw_spin_unlock_irqrestore+0x8/0x50 (P)
| wake_up_process+0x18/0x24
| process_timeout+0x14/0x20
| call_timer_fn.isra.0+0x24/0x80
| ...
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-8-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently dump_backtrace() can only print the PC value at each step of
the unwind, as this is all the information that arch_stack_walk()
passes to the dump_backtrace_entry() callback.
In future we'd like to print some additional information, such as the
origin of entries (e.g. PC, LR, FP) and/or the reliability thereof.
In preparation for doing so, this patch moves dump_backtrace() over to
kunwind_stack_walk(), which passes the full kunwind_state to the
callback.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently the signal handling code has its own struct frame_record,
the definition of struct pt_regs open-codes a frame record as an array,
and the kernel unwinder hard-codes frame record offsets.
Move to a common struct frame_record that can be used throughout the
kernel.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
ftrace_graph_ret_addr() takes an 'idx' integer pointer that is used to
optimize the stack unwinding process. arm64 currently passes `NULL` for
this parameter which stops it from utilizing these optimizations.
Further, the current code for ftrace_graph_ret_addr() will just return
the passed in return address if it is NULL which will break this usage.
Pass a valid integer pointer to ftrace_graph_ret_addr() similar to
x86_64's stack unwinder.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Fixes: 29c1c24a2707 ("function_graph: Fix up ftrace_graph_ret_addr()")
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240618162342.28275-1-puranjay@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently, userstacktrace is unsupported for ftrace and uprobe
tracers on arm64. This patch uses the perf_callchain_user() code
as blueprint to implement the arch_stack_walk_user() which add
userstacktrace support on arm64.
Meanwhile, we can use arch_stack_walk_user() to simplify the
implementation of perf_callchain_user().
This patch is tested pass with ftrace, uprobe and perf tracers
profiling userstacktrace cases.
Tested-by: chenqiwu <qiwu.chen@transsion.com>
Signed-off-by: chenqiwu <qiwu.chen@transsion.com>
Link: https://lore.kernel.org/r/20231219022229.10230-1-qiwu.chen@transsion.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Large effort by Eric to lower rtnl_lock pressure and remove locks:
- Make commonly used parts of rtnetlink (address, route dumps
etc) lockless, protected by RCU instead of rtnl_lock.
- Add a netns exit callback which already holds rtnl_lock,
allowing netns exit to take rtnl_lock once in the core instead
of once for each driver / callback.
- Remove locks / serialization in the socket diag interface.
- Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
- Remove the dev_base_lock, depend on RCU where necessary.
- Support busy polling on a per-epoll context basis. Poll length and
budget parameters can be set independently of system defaults.
- Introduce struct net_hotdata, to make sure read-mostly global
config variables fit in as few cache lines as possible.
- Add optional per-nexthop statistics to ease monitoring / debug of
ECMP imbalance problems.
- Support TCP_NOTSENT_LOWAT in MPTCP.
- Ensure that IPv6 temporary addresses' preferred lifetimes are long
enough, compared to other configured lifetimes, and at least 2 sec.
- Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
- Add support for the independent control state machine for bonding
per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
control state machine.
- Add "network ID" to MCTP socket APIs to support hosts with multiple
disjoint MCTP networks.
- Re-use the mono_delivery_time skbuff bit for packets which user
space wants to be sent at a specified time. Maintain the timing
information while traversing veth links, bridge etc.
- Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
- Simplify many places iterating over netdevs by using an xarray
instead of a hash table walk (hash table remains in place, for use
on fastpaths).
- Speed up scanning for expired routes by keeping a dedicated list.
- Speed up "generic" XDP by trying harder to avoid large allocations.
- Support attaching arbitrary metadata to netconsole messages.
Things we sprinkled into general kernel code:
- Enforce VM_IOREMAP flag and range in ioremap_page_range and
introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
bpf_arena).
- Rework selftest harness to enable the use of the full range of ksft
exit code (pass, fail, skip, xfail, xpass).
Netfilter:
- Allow userspace to define a table that is exclusively owned by a
daemon (via netlink socket aliveness) without auto-removing this
table when the userspace program exits. Such table gets marked as
orphaned and a restarting management daemon can re-attach/regain
ownership.
- Speed up element insertions to nftables' concatenated-ranges set
type. Compact a few related data structures.
BPF:
- Add BPF token support for delegating a subset of BPF subsystem
functionality from privileged system-wide daemons such as systemd
through special mount options for userns-bound BPF fs to a trusted
& unprivileged application.
- Introduce bpf_arena which is sparse shared memory region between
BPF program and user space where structures inside the arena can
have pointers to other areas of the arena, and pointers work
seamlessly for both user-space programs and BPF programs.
- Introduce may_goto instruction that is a contract between the
verifier and the program. The verifier allows the program to loop
assuming it's behaving well, but reserves the right to terminate
it.
- Extend the BPF verifier to enable static subprog calls in spin lock
critical sections.
- Support registration of struct_ops types from modules which helps
projects like fuse-bpf that seeks to implement a new struct_ops
type.
- Add support for retrieval of cookies for perf/kprobe multi links.
- Support arbitrary TCP SYN cookie generation / validation in the TC
layer with BPF to allow creating SYN flood handling in BPF
firewalls.
- Add code generation to inline the bpf_kptr_xchg() helper which
improves performance when stashing/popping the allocated BPF
objects.
Wireless:
- Add SPP (signaling and payload protected) AMSDU support.
- Support wider bandwidth OFDMA, as required for EHT operation.
Driver API:
- Major overhaul of the Energy Efficient Ethernet internals to
support new link modes (2.5GE, 5GE), share more code between
drivers (especially those using phylib), and encourage more
uniform behavior. Convert and clean up drivers.
- Define an API for querying per netdev queue statistics from
drivers.
- IPSec: account in global stats for fully offloaded sessions.
- Create a concept of Ethernet PHY Packages at the Device Tree level,
to allow parameterizing the existing PHY package code.
- Enable Rx hashing (RSS) on GTP protocol fields.
Misc:
- Improvements and refactoring all over networking selftests.
- Create uniform module aliases for TC classifiers, actions, and
packet schedulers to simplify creating modprobe policies.
- Address all missing MODULE_DESCRIPTION() warnings in networking.
- Extend the Netlink descriptions in YAML to cover message
encapsulation or "Netlink polymorphism", where interpretation of
nested attributes depends on link type, classifier type or some
other "class type".
Drivers:
- Ethernet high-speed NICs:
- Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
- Intel (100G, ice, idpf):
- support E825-C devices
- nVidia/Mellanox:
- support devices with one port and multiple PCIe links
- Broadcom (bnxt):
- support n-tuple filters
- support configuring the RSS key
- Wangxun (ngbe/txgbe):
- implement irq_domain for TXGBE's sub-interrupts
- Pensando/AMD:
- support XDP
- optimize queue submission and wakeup handling (+17% bps)
- optimize struct layout, saving 28% of memory on queues
- Ethernet NICs embedded and virtual:
- Google cloud vNIC:
- refactor driver to perform memory allocations for new queue
config before stopping and freeing the old queue memory
- Synopsys (stmmac):
- obey queueMaxSDU and implement counters required by 802.1Qbv
- Renesas (ravb):
- support packet checksum offload
- suspend to RAM and runtime PM support
- Ethernet switches:
- nVidia/Mellanox:
- support for nexthop group statistics
- Microchip:
- ksz8: implement PHY loopback
- add support for KSZ8567, a 7-port 10/100Mbps switch
- PTP:
- New driver for RENESAS FemtoClock3 Wireless clock generator.
- Support OCP PTP cards designed and built by Adva.
- CAN:
- Support recvmsg() flags for own, local and remote traffic on CAN
BCM sockets.
- Support for esd GmbH PCIe/402 CAN device family.
- m_can:
- Rx/Tx submission coalescing
- wake on frame Rx
- WiFi:
- Intel (iwlwifi):
- enable signaling and payload protected A-MSDUs
- support wider-bandwidth OFDMA
- support for new devices
- bump FW API to 89 for AX devices; 90 for BZ/SC devices
- MediaTek (mt76):
- mt7915: newer ADIE version support
- mt7925: radio temperature sensor support
- Qualcomm (ath11k):
- support 6 GHz station power modes: Low Power Indoor (LPI),
Standard Power) SP and Very Low Power (VLP)
- QCA6390 & WCN6855: support 2 concurrent station interfaces
- QCA2066 support
- Qualcomm (ath12k):
- refactoring in preparation for Multi-Link Operation (MLO)
support
- 1024 Block Ack window size support
- firmware-2.bin support
- support having multiple identical PCI devices (firmware needs
to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
- QCN9274: support split-PHY devices
- WCN7850: enable Power Save Mode in station mode
- WCN7850: P2P support
- RealTek:
- rtw88: support for more rtw8811cu and rtw8821cu devices
- rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
- rtlwifi: speed up USB firmware initialization
- rtwl8xxxu:
- RTL8188F: concurrent interface support
- Channel Switch Announcement (CSA) support in AP mode
- Broadcom (brcmfmac):
- per-vendor feature support
- per-vendor SAE password setup
- DMI nvram filename quirk for ACEPC W5 Pro"
* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
nexthop: Fix out-of-bounds access during attribute validation
nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
nexthop: Only parse NHA_OP_FLAGS for get messages that require it
bpf: move sleepable flag from bpf_prog_aux to bpf_prog
bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
selftests/bpf: Add kprobe multi triggering benchmarks
ptp: Move from simple ida to xarray
vxlan: Remove generic .ndo_get_stats64
vxlan: Do not alloc tstats manually
devlink: Add comments to use netlink gen tool
nfp: flower: handle acti_netdevs allocation failure
net/packet: Add getsockopt support for PACKET_COPY_THRESH
net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
selftests/bpf: Add bpf_arena_htab test.
selftests/bpf: Add bpf_arena_list test.
selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
bpf: Add helper macro bpf_addr_space_cast()
libbpf: Recognize __arena global variables.
bpftool: Recognize arena map type
...
|
|
Make arch_kunwind_consume_entry() as __always_inline otherwise the
compiler might not inline it and allow attaching probes to it.
Without this, just probing arch_kunwind_consume_entry() via
<tracefs>/kprobe_events will crash the kernel on arm64.
The crash can be reproduced using the following compiler and kernel
combination:
clang version 19.0.0git (https://github.com/llvm/llvm-project.git d68d29516102252f6bf6dc23fb22cef144ca1cb3)
commit 87adedeba51a ("Merge tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
[root@localhost ~]# echo 'p arch_kunwind_consume_entry' > /sys/kernel/debug/tracing/kprobe_events
[root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
Modules linked in: aes_ce_blk aes_ce_cipher ghash_ce sha2_ce virtio_net sha256_arm64 sha1_ce arm_smccc_trng net_failover failover virtio_mmio uio_pdrv_genirq uio sch_fq_codel dm_mod dax configfs
CPU: 3 PID: 1405 Comm: bash Not tainted 6.8.0-rc6+ #14
Hardware name: linux,dummy-virt (DT)
pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kprobe_breakpoint_handler+0x17c/0x258
lr : kprobe_breakpoint_handler+0x17c/0x258
sp : ffff800085d6ab60
x29: ffff800085d6ab60 x28: ffff0000066f0040 x27: ffff0000066f0b20
x26: ffff800081fa7b0c x25: 0000000000000002 x24: ffff00000b29bd18
x23: ffff00007904c590 x22: ffff800081fa6590 x21: ffff800081fa6588
x20: ffff00000b29bd18 x19: ffff800085d6ac40 x18: 0000000000000079
x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000004
x14: ffff80008277a940 x13: 0000000000000003 x12: 0000000000000003
x11: 00000000fffeffff x10: c0000000fffeffff x9 : aa95616fdf80cc00
x8 : aa95616fdf80cc00 x7 : 205d343137373231 x6 : ffff800080fb48ec
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000000 x1 : ffff800085d6a910 x0 : 0000000000000079
Call trace:
kprobes: Failed to recover from reentered kprobes.
kprobes: Dump kprobe:
.symbol_name = arch_kunwind_consume_entry, .offset = 0, .addr = arch_kunwind_consume_entry+0x0/0x40
------------[ cut here ]------------
kernel BUG at arch/arm64/kernel/probes/kprobes.c:241!
kprobes: Failed to recover from reentered kprobes.
kprobes: Dump kprobe:
.symbol_name = arch_kunwind_consume_entry, .offset = 0, .addr = arch_kunwind_consume_entry+0x0/0x40
Fixes: 1aba06e7b2b4 ("arm64: stacktrace: factor out kunwind_stack_walk()")
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20240229231620.24846-1-puranjay12@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This will be used by bpf_throw() to unwind till the program marked as
exception boundary and run the callback with the stack of the main
program.
This is required for supporting BPF exceptions on ARM64.
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240201125225.72796-2-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently arm64 uses the generic arch_stack_walk() interface for all
stack walking code. This only passes a PC value and cookie to the unwind
callback, whereas we'd like to pass some additional information in some
cases. For example, the BPF exception unwinder wants the FP, for
reliable stacktrace we'll want to perform additional checks on other
portions of unwind state, and we'd like to expand the information
printed by dump_backtrace() to include provenance and reliability
information.
As preparation for all of the above, this patch factors the core unwind
logic out of arch_stack_walk() and into a new kunwind_stack_walk()
function which provides all of the unwind state to a callback function.
The existing arch_stack_walk() interface is implemented atop this.
The kunwind_stack_walk() function is intended to be a private
implementation detail of unwinders in stacktrace.c, and not something to
be exported generally to kernel code. It is __always_inline'd into its
caller so that neither it or its caller appear in stactraces (which is
the existing/required behavior for arch_stack_walk() and friends) and so
that the compiler can optimize away some of the indirection.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Puranjay Mohan <puranjay12@gmail.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20231124110511.2795958-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
On arm64 we share some unwinding code between the regular kernel
unwinder and the KVM hyp unwinder. Some of this common code only matters
to the regular unwinder, e.g. the `kr_cur` and `task` fields in the
common struct unwind_state.
We're likely to add more state which only matters for regular kernel
unwinding (or only for hyp unwinding). In preparation for such changes,
this patch factors out the kernel-specific state into a new struct
kunwind_state, and updates the kernel unwind code accordingly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Puranjay Mohan <puranjay12@gmail.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20231124110511.2795958-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently we strip the PAC from pointers using C code, which requires
generating bitmasks, and conditionally clearing/setting bits depending
on bit 55. We can do better by using XPACLRI directly.
When the logic was originally written to strip PACs from user pointers,
contemporary toolchains used for the kernel had assemblers which were
unaware of the PAC instructions. As stripping the PAC from userspace
pointers required unconditional clearing of a fixed set of bits (which
could be performed with a single instruction), it was simpler to
implement the masking in C than it was to make use of XPACI or XPACLRI.
When support for in-kernel pointer authentication was added, the
stripping logic was extended to cover TTBR1 pointers, requiring several
instructions to handle whether to clear/set bits dependent on bit 55 of
the pointer.
This patch simplifies the stripping of PACs by using XPACLRI directly,
as contemporary toolchains do within __builtin_return_address(). This
saves a number of instructions, especially where
__builtin_return_address() does not implicitly strip the PAC but is
heavily used (e.g. with tracepoints). As the kernel might be compiled
with an assembler without knowledge of XPACLRI, it is assembled using
the 'HINT #7' alias, which results in an identical opcode.
At the same time, I've split ptrauth_strip_insn_pac() into
ptrauth_strip_user_insn_pac() and ptrauth_strip_kernel_insn_pac()
helpers so that we can avoid unnecessary PAC stripping when pointer
authentication is not in use in userspace or kernel respectively.
The underlying xpaclri() macro uses inline assembly which clobbers x30.
The clobber causes the compiler to save/restore the original x30 value
in a frame record (protected with PACIASP and AUTIASP when in-kernel
authentication is enabled), so this does not provide a gadget to alter
the return address. Similarly this does not adversely affect unwinding
due to the presence of the frame record.
The ptrauth_user_pac_mask() and ptrauth_kernel_pac_mask() are exported
from the kernel in ptrace and core dumps, so these are retained. A
subsequent patch will move them out of <asm/compiler.h>.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230412160134.306148-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The arm64 stacktrace code can be used in kprobe context, and so cannot
be safely probed. Some (but not all) of the unwind functions are
annotated with `NOKPROBE_SYMBOL()` to ensure this, with others markes as
`__always_inline`, relying on the top-level unwind function being marked
as `noinstr`.
This patch has stacktrace.c consistently mark the internal stacktrace
functions as `__always_inline`, removing the need for NOKPROBE_SYMBOL()
as the top-level unwind function (arch_stack_walk()) is marked as
`noinstr`. This is more consistent and is a simpler pattern to follow
for future additions to stacktrace.c.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230411162943.203199-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
For historical reasons, the backtrace dumping functions are placed in
the middle of stacktrace.c, despite using functions defined later. For
clarity, and to make subsequent refactoring easier, move the dumping
functions to the end of stacktrace.c
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230411162943.203199-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The function which calls the top-level backtracing function may have
been instrumented with ftrace and/or kprobes, and hence the first return
address may have been rewritten.
Factor out the existing fgraph / kretprobes address recovery, and use
this for the first address. As the comment for the fgraph case isn't all
that helpful, I've also dropped that.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230411162943.203199-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The EFI runtime services run from a dedicated stack now, and so the
stack unwinder needs to be informed about this.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Mark arch_stack_walk() as noinstr instead of notrace and inline functions
called from arch_stack_walk() as __always_inline so that user does not
put any instrumentations on it, because this function can be used from
return_address() which is used by lockdep.
Without this, if the kernel built with CONFIG_LOCKDEP=y, just probing
arch_stack_walk() via <tracefs>/kprobe_events will crash the kernel on
arm64.
# echo p arch_stack_walk >> ${TRACEFS}/kprobe_events
# echo 1 > ${TRACEFS}/events/kprobes/enable
kprobes: Failed to recover from reentered kprobes.
kprobes: Dump kprobe:
.symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0
------------[ cut here ]------------
kernel BUG at arch/arm64/kernel/probes/kprobes.c:241!
kprobes: Failed to recover from reentered kprobes.
kprobes: Dump kprobe:
.symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0
------------[ cut here ]------------
kernel BUG at arch/arm64/kernel/probes/kprobes.c:241!
PREEMPT SMP
Modules linked in:
CPU: 0 PID: 17 Comm: migration/0 Tainted: G N 6.1.0-rc5+ #6
Hardware name: linux,dummy-virt (DT)
Stopper: 0x0 <- 0x0
pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kprobe_breakpoint_handler+0x178/0x17c
lr : kprobe_breakpoint_handler+0x178/0x17c
sp : ffff8000080d3090
x29: ffff8000080d3090 x28: ffff0df5845798c0 x27: ffffc4f59057a774
x26: ffff0df5ffbba770 x25: ffff0df58f420f18 x24: ffff49006f641000
x23: ffffc4f590579768 x22: ffff0df58f420f18 x21: ffff8000080d31c0
x20: ffffc4f590579768 x19: ffffc4f590579770 x18: 0000000000000006
x17: 5f6b636174735f68 x16: 637261203d207264 x15: 64612e202c30203d
x14: 2074657366666f2e x13: 30633178302f3078 x12: 302b6b6c61775f6b
x11: 636174735f686372 x10: ffffc4f590dc5bd8 x9 : ffffc4f58eb31958
x8 : 00000000ffffefff x7 : ffffc4f590dc5bd8 x6 : 80000000fffff000
x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : ffff0df5845798c0 x0 : 0000000000000064
Call trace:
kprobes: Failed to recover from reentered kprobes.
kprobes: Dump kprobe:
.symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0
------------[ cut here ]------------
kernel BUG at arch/arm64/kernel/probes/kprobes.c:241!
Fixes: 39ef362d2d45 ("arm64: Make return_address() use arch_stack_walk()")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/166994751368.439920.3236636557520824664.stgit@devnote3
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently unwind_next_frame_record() has an optional callback to convert
the address space of the FP. This is necessary for the NVHE unwinder,
which tracks the stacks in the hyp VA space, but accesses the frame
records in the kernel VA space.
This is a bit unfortunate since it clutters unwind_next_frame_record(),
which will get in the way of future rework.
Instead, this patch changes the NVHE unwinder to track the stacks in the
kernel's VA space and translate to FP prior to calling
unwind_next_frame_record(). This removes the need for the translate_fp()
callback, as all unwinders consistently track stacks in the native
address space of the unwinder.
At the same time, this patch consolidates the generation of the stack
addresses behind the stackinfo_get_*() helpers.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220901130646.1316937-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently we call an on_accessible_stack() callback for each step of the
unwinder, requiring redundant work to be performed in the core of the
unwind loop (e.g. disabling preemption around accesses to per-cpu
variables containing stack boundaries). To prevent unwind loops which go
through a stack multiple times, we have to track the set of unwound
stacks, requiring a stack_type enum which needs to cater for all the
stacks of all possible callees. To prevent loops within a stack, we must
track the prior FP values.
This patch reworks the unwinder to minimize the work in the core of the
unwinder, and to remove the need for the stack_type enum. The set of
accessible stacks (and their boundaries) are determined at the start of
the unwind, and the current stack is tracked during the unwind, with
completed stacks removed from the set of accessible stacks. This makes
the boundary checks more accurate (e.g. detecting overlapped frame
records), and removes the need for separate tracking of the prior FP and
visited stacks.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220901130646.1316937-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
In subsequent patches we'll want to acquire the stack boundaries
ahead-of-time, and we'll need to be able to acquire the relevant
stack_info regardless of whether we have an object the happens to be on
the stack.
This patch replaces the on_XXX_stack() helpers with stackinfo_get_XXX()
helpers, with the caller being responsible for the checking whether an
object is on a relevant stack. For the moment this is moved into the
on_accessible_stack() functions, making these slightly larger;
subsequent patches will remove the on_accessible_stack() functions and
simplify the logic.
The on_irq_stack() and on_task_stack() helpers are kept as these are
used by IRQ entry sequences and stackleak respectively. As they're only
used as predicates, the stack_info pointer parameter is removed in both
cases.
As the on_accessible_stack() functions are always passed a non-NULL info
pointer, these now update info unconditionally. When updating the type
to STACK_TYPE_UNKNOWN, the low/high bounds are also modified, but as
these will not be consumed this should have no adverse affect.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220901130646.1316937-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
For clarity and ease of maintenance, it would be helpful for all the
stack helpers to be in the same place.
Move the SDEI stack helpers into the stacktrace code where all the other
stack helpers live.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Fuad Tabba <tabba@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220901130646.1316937-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The unwind_next_common() function unwinds a single frame record. There
are other unwind steps (e.g. unwinding through trampolines) which are
handled in the regular kernel unwinder, and in future there may be other
common unwind helpers.
Clarify the purpose of unwind_next_common() by renaming it to
unwind_next_frame_record(). At the same time, add commentary, and delete
the redundant comment at the top of asm/stacktrace/common.h.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220901130646.1316937-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently unwind_next_common() takes a pointer to a stack_info which is
only ever used within unwind_next_common().
Make it a local variable and simplify callers.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220901130646.1316937-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Having multiple versions of on_accessible_stack() (one per unwinder)
makes it very hard to reason about what is used where due to the
complexity of the various includes, the forward declarations, and
the reliance on everything being 'inline'.
Instead, move the code back where it should be. Each unwinder
implements:
- on_accessible_stack() as well as the helpers it depends on,
- unwind()/unwind_next(), as they pass on_accessible_stack as
a parameter to unwind_next_common() (which is the only common
code here)
This hardly results in any duplication, and makes it much
easier to reason about the code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20220727142906.1856759-4-maz@kernel.org
|
|
Move unwind() to stacktrace/common.h, and as a result
the kernel unwind_next() to asm/stacktrace.h. This allow
reusing unwind() in the implementation of the nVHE HYP
stack unwinder, later in the series.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-6-kaleshsingh@google.com
|
|
The unwinder code is made reusable so that it can be used to
unwind various types of stacks. One usecase is unwinding the
nVHE hyp stack from the host (EL1) in non-protected mode. This
means that the unwinder must be able to translate HYP stack
addresses to kernel addresses.
Add a callback (stack_trace_translate_fp_fn) to allow specifying
the translation function.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-5-kaleshsingh@google.com
|
|
Move common unwind_next logic to stacktrace/common.h. This allows
reusing the code in the implementation the nVHE hypervisor stack
unwinder, later in this series.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-4-kaleshsingh@google.com
|
|
In order to reuse the arm64 stack unwinding logic for the nVHE
hypervisor stack, move the common code to a shared header
(arch/arm64/include/asm/stacktrace/common.h).
The nVHE hypervisor cannot safely link against kernel code, so we
make use of the shared header to avoid duplicated logic later in
this series.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-2-kaleshsingh@google.com
|
|
Copy the task argument passed to arch_stack_walk() to unwind_state so that
it can be passed to unwind functions via unwind_state rather than as a
separate argument. The task is a fundamental part of the unwind state.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220617180219.20352-3-madvenka@linux.microsoft.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
unwind_init() is currently a single function that initializes all of the
unwind state. Split it into the following functions and call them
appropriately:
- unwind_init_from_regs() - initialize from regs passed by caller.
- unwind_init_from_caller() - initialize for the current task
from the caller of arch_stack_walk().
- unwind_init_from_task() - initialize from the saved state of a
task other than the current task. In this case, the other
task must not be running.
This is done for two reasons:
- the different ways of initializing are clear
- specialized code can be added to each initializer in the future.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220617180219.20352-2-madvenka@linux.microsoft.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Use the non-atomic version of set_bit() in arch/arm64/kernel/stacktrace.c,
as there is no concurrent accesses to frame->prev_type.
This speeds up stack trace collection and improves the boot time of
Generic KASAN by 2-5%.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lore.kernel.org/r/23dfa36d1cc91e4a1059945b7834eac22fb9854d.1653317461.git.andreyknvl@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Disable KASAN instrumentation of arch/arm64/kernel/stacktrace.c.
This speeds up Generic KASAN by 5-20%.
As a side-effect, KASAN is now unable to detect bugs in the stack trace
collection code. This is taken as an acceptable downside.
Also replace READ_ONCE_NOCHECK() with READ_ONCE() in stacktrace.c.
As the file is now not instrumented, there is no need to use the
NOCHECK version of READ_ONCE().
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lore.kernel.org/r/c4c944a2a905e949760fbeb29258185087171708.1653317461.git.andreyknvl@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
For historical reasons, the naming of parameters and their types in the
arm64 stacktrace code differs from that used in generic code and other
architectures, even though the types are equivalent.
For consistency and clarity, use the generic names.
There should be no functional change as a result of this patch.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series.
Link: https://lore.kernel.org/r/20220413145910.3060139-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Rename "struct stackframe" to "struct unwind_state" for consistency and
better naming. Accordingly, rename variable/argument "frame" to "state".
There should be no functional change as a result of this patch.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series.
Link: https://lore.kernel.org/r/20220413145910.3060139-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Rename unwinder functions for consistency and better naming.
- Rename start_backtrace() to unwind_init().
- Rename unwind_frame() to unwind_next().
- Rename walk_stackframe() to unwind().
There should be no functional change as a result of this patch.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series.
Link: https://lore.kernel.org/r/20220413145910.3060139-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Now that arm64 uses arch_stack_walk() consistently, struct stackframe is
only used within stacktrace.c. To make it easier to read and maintain
this code, it would be nicer if the definition were there too.
Move the definition into stacktrace.c.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviwed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series.
Link: https://lore.kernel.org/r/20220413145910.3060139-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The comment at the top of stacktrace.c isn't all that helpful, as it's
not associated with the code which inspects the frame record, and the
code example isn't representative of common code generation today.
Delete it.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series.
Link: https://lore.kernel.org/r/20220413145910.3060139-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently, there is a check for a NULL task in unwind_frame(). It is not
needed since all current callers pass a non-NULL task.
There should be no functional change as a result of this patch.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series.
Link: https://lore.kernel.org/r/20220413145910.3060139-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Mark the start_backtrace() as notrace and NOKPROBE_SYMBOL
because this function is called from ftrace and lockdep to
get the caller address via return_address(). The lockdep
is used in kprobes, it should also be NOKPROBE_SYMBOL.
Fixes: b07f3499661c ("arm64: stacktrace: Move start_backtrace() out of the header")
Cc: <stable@vger.kernel.org> # 5.13.x
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/164301227374.1433152.12808232644267107415.stgit@devnote2
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Now that open-coded stack unwinds have been converted to
arch_stack_walk(), we no longer need to expose any of unwind_frame(),
walk_stackframe(), or start_backtrace() outside of stacktrace.c.
Make those functions private to stacktrace.c, removing their prototypes
from <asm/stacktrace.h> and marking them static.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
substantially rework arm64's unwinding code. As part of this, we want to
minimize the set of unwind interfaces we expose, and avoid open-coding
of unwind logic.
Currently, dump_backtrace() walks the stack of the current task or a
blocked task by calling stact_backtrace() and iterating unwind steps
using unwind_frame(). This can be written more simply in terms of
arch_stack_walk(), considering three distinct cases:
1) When unwinding a blocked task, start_backtrace() is called with the
blocked task's saved PC and FP, and the unwind proceeds immediately
from this point without skipping any entries. This is functionally
equivalent to calling arch_stack_walk() with the blocked task, which
will start with the task's saved PC and FP.
There is no functional change to this case.
2) When unwinding the current task without regs, start_backtrace() is
called with dump_backtrace() as the PC and __builtin_frame_address(0)
as the next frame, and the unwind proceeds immediately without
skipping. This is *almost* functionally equivalent to calling
arch_stack_walk() for the current task, which will start with its
caller (i.e. an offset into dump_backtrace()) as the PC, and the
callers frame record as the next frame.
The only difference being that dump_backtrace() will be reported with
an offset (which is strictly more correct than currently). Otherwise
there is no functional cahnge to this case.
3) When unwinding the current task with regs, start_backtrace() is
called with dump_backtrace() as the PC and __builtin_frame_address(0)
as the next frame, and the unwind is performed silently until the
next frame is the frame pointed to by regs->fp. Reporting starts
from regs->pc and continues from the frame in regs->fp.
Historically, this pre-unwind was necessary to correctly record
return addresses rewritten by the ftrace graph calller, but this is
no longer necessary as these are now recovered using the FP since
commit:
c6d3cd32fd0064af ("arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR")
This pre-unwind is not necessary to recover return addresses
rewritten by kretprobes, which historically were not recovered, and
are now recovered using the FP since commit:
cd9bc2c9258816dc ("arm64: Recover kretprobe modified return address in stacktrace")
Thus, this is functionally equivalent to calling arch_stack_walk()
with the current task and regs, which will start with regs->pc as the
PC and regs->fp as the next frame, without a pre-unwind.
This patch makes dump_backtrace() use arch_stack_walk(). This simplifies
dump_backtrace() and will permit subsequent changes to the unwind code.
Aside from the improved reporting when unwinding current without regs,
there should be no functional change as a result of this patch.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
[Mark: elaborate commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129142849.3056714-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Make arch_stack_walk() available for ARCH_STACKWALK architectures
without it being entangled in STACKTRACE.
Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Mark: rebase, drop unnecessary arm change]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When CONFIG_FUNCTION_GRAPH_TRACER is selected and the function graph
tracer is in use, unwind_frame() may erroneously associate a traced
function with an incorrect return address. This can happen when starting
an unwind from a pt_regs, or when unwinding across an exception
boundary.
This can be seen when recording with perf while the function graph
tracer is in use. For example:
| # echo function_graph > /sys/kernel/debug/tracing/current_tracer
| # perf record -g -e raw_syscalls:sys_enter:k /bin/true
| # perf report
... reports the callchain erroneously as:
| el0t_64_sync
| el0t_64_sync_handler
| el0_svc_common.constprop.0
| perf_callchain
| get_perf_callchain
| syscall_trace_enter
| syscall_trace_enter
... whereas when the function graph tracer is not in use, it reports:
| el0t_64_sync
| el0t_64_sync_handler
| el0_svc
| do_el0_svc
| el0_svc_common.constprop.0
| syscall_trace_enter
| syscall_trace_enter
The underlying problem is that ftrace_graph_get_ret_stack() takes an
index offset from the most recent entry added to the fgraph return
stack. We start an unwind at offset 0, and increment the offset each
time we encounter a rewritten return address (i.e. when we see
`return_to_handler`). This is broken in two cases:
1) Between creating a pt_regs and starting the unwind, function calls
may place entries on the stack, leaving an arbitrary offset which we
can only determine by performing a full unwind from the caller of the
unwind code (and relying on none of the unwind code being
instrumented).
This can result in erroneous entries being reported in a backtrace
recorded by perf or kfence when the function graph tracer is in use.
Currently show_regs() is unaffected as dump_backtrace() performs an
initial unwind.
2) When unwinding across an exception boundary (whether continuing an
unwind or starting a new unwind from regs), we currently always skip
the LR of the interrupted context. Where this was live and contained
a rewritten address, we won't consume the corresponding fgraph ret
stack entry, leaving subsequent entries off-by-one.
This can result in erroneous entries being reported in a backtrace
performed by any in-kernel unwinder when that backtrace crosses an
exception boundary, with entries after the boundary being reported
incorrectly. This includes perf, kfence, show_regs(), panic(), etc.
To fix this, we need to be able to uniquely identify each rewritten
return address such that we can map this back to the original return
address. We can use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR to associate
each rewritten return address with a unique location on the stack. As
the return address is passed in the LR (and so is not guaranteed a
unique location in memory), we use the FP upon entry to the function
(i.e. the address of the caller's frame record) as the return address
pointer. Any nested call will have a different FP value as the caller
must create its own frame record and update FP to point to this.
Since ftrace_graph_ret_addr() requires the return address with the PAC
stripped, the stripping of the PAC is moved before the fixup of the
rewritten address. As we would unconditionally strip the PAC, moving
this earlier is not harmful, and we can avoid a redundant strip in the
return address fixup code.
I've tested this with the perf case above, the ftrace selftests, and
a number of ad-hoc unwinder tests. The tests all pass, and I have seen
no unexpected behaviour as a result of this change. I've tested with
pointer authentication under QEMU TCG where magic-sysrq+l correctly
recovers the original return addresses.
Note that this doesn't fix the issue of skipping a live LR at an
exception boundary, which is a more general problem and requires more
substantial rework. Were we to consume the LR in all cases this would
result in warnings where the interrupted context's LR contains
`return_to_handler`, but the FP has been altered, e.g.
| func:
| <--- ftrace entry ---> // logs FP & LR, rewrites LR
| STP FP, LR, [SP, #-16]!
| MOV FP, SP
| <--- INTERRUPT --->
... as ftrace_graph_get_ret_stack() fill not find a matching entry,
triggering the WARN_ON_ONCE() in unwind_frame().
Link: https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local
Link: https://lore.kernel.org/r/20211027132529.30027-1-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211029162245.39761-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|