Age | Commit message (Collapse) | Author |
|
* for-next/mm:
arm64: mm: always map fixmap at page granularity
arm64: mm: move fixmap code to its own file
arm64: add FIXADDR_TOT_{START,SIZE}
Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""
arm: uaccess: Remove memcpy_page_flushcache()
mm,kfence: decouple kfence from page granularity mapping judgement
|
|
* for-next/misc:
arm64: kexec: include reboot.h
arm64: delete dead code in this_cpu_set_vectors()
arm64: kernel: Fix kernel warning when nokaslr is passed to commandline
arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
arm64/sme: Fix some comments of ARM SME
arm64/signal: Alloc tpidr2 sigframe after checking system_supports_tpidr2()
arm64/signal: Use system_supports_tpidr2() to check TPIDR2
arm64: compat: Remove defines now in asm-generic
arm64: kexec: remove unnecessary (void*) conversions
arm64: armv8_deprecated: remove unnecessary (void*) conversions
firmware: arm_sdei: Fix sleep from invalid context BUG
|
|
* for-next/kdump:
arm64: kdump: defer the crashkernel reservation for platforms with no DMA memory zones
arm64: kdump: do not map crashkernel region specifically
arm64: kdump : take off the protection on crashkernel memory region
|
|
* for-next/ftrace:
arm64: ftrace: Simplify get_ftrace_plt
arm64: ftrace: Add direct call support
ftrace: selftest: remove broken trace_direct_tramp
ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS
ftrace: Store direct called addresses in their ops
ftrace: Rename _ftrace_direct_multi APIs to _ftrace_direct APIs
ftrace: Remove the legacy _ftrace_direct API
ftrace: Replace uses of _ftrace_direct APIs with _ftrace_direct_multi
ftrace: Let unregister_ftrace_direct_multi() call ftrace_free_filter()
|
|
* for-next/cpufeature:
arm64/cpufeature: Use helper macro to specify ID register for capabilites
arm64/cpufeature: Consistently use symbolic constants for min_field_value
arm64/cpufeature: Pull out helper for CPUID register definitions
|
|
* for-next/asm:
arm64: uaccess: remove unnecessary earlyclobber
arm64: uaccess: permit put_{user,kernel} to use zero register
arm64: uaccess: permit __smp_store_release() to use zero register
arm64: atomics: lse: improve cmpxchg implementation
|
|
* for-next/acpi:
ACPI: AGDI: Improve error reporting for problems during .remove()
|
|
Include reboot.h in machine_kexec.c for declaration of
machine_crash_shutdown.
gcc-12 with W=1 reports:
arch/arm64/kernel/machine_kexec.c:257:6: warning: no previous prototype for 'machine_crash_shutdown' [-Wmissing-prototypes]
257 | void machine_crash_shutdown(struct pt_regs *regs)
No functional changes intended.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230418-arm64-kexec-include-reboot-v1-1-8453fd4fb3fb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The "slot" variable is an enum, and in this context it is an unsigned
int. So the type means it can never be negative and also we never pass
invalid data to this function. If something did pass invalid data then
this check would be insufficient protection.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/73859c9e-dea0-4764-bf01-7ae694fa2e37@kili.mountain
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When defining which value to look for in a system register field we
currently manually specify the register, field shift, width and sign and
the value to look for. This opens the potential for error with for example
the wrong field width or sign being specified, an enumeration value for
a different similarly named field or letting something be initialised to 0.
Since we now generate defines for all the ID registers we now have named
constants for all of these things generated from the system register
description, meaning that we can generate initialisation for all the fields
used in matching from a minimal specification of register, field and match
value. This is both shorter and eliminates or makes build failures several
potential errors.
No change in the generated binary.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230303-arm64-cpufeature-helpers-v2-3-4c8f28a6f203@kernel.org
[will: Drop explicit '.sign' assignment for BTI feature]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
A number of the cpufeatures use raw numbers for the minimum field values
specified rather than symbolic constants. In preparation for the use of
helper macros replace all these with the appropriate constants.
No change in the generated binary.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230303-arm64-cpufeature-helpers-v2-2-4c8f28a6f203@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We use the same structure to match hwcaps and CPU features so we can use
the same helper to generate the fields required. Pull the portion of the
current hwcaps helper that initialises the fields out into a separate
define placed earlier in the file so we can use it for cpufeatures.
No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230303-arm64-cpufeature-helpers-v2-1-4c8f28a6f203@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Returning an error value in a platform driver's remove callback results in
a generic error message being emitted by the driver core, but otherwise it
doesn't make a difference. The device goes away anyhow.
So instead of triggering the generic platform error message, emit a more
helpful message if a problem occurs and return 0 to suppress the generic
message.
This patch is a preparation for making platform remove callbacks return
void.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Link: https://lore.kernel.org/r/20221014160623.467195-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Will Deacon <will@kernel.org>
|
|
'Unknown kernel command line parameters "nokaslr", will be passed to
user space' message is noticed in the dmesg when nokaslr is passed to
the kernel commandline on ARM64 platform. This is because nokaslr param
is handled by early cpufeature detection infrastructure and the parameter
is never consumed by a kernel param handler. Fix this warning by
providing a dummy kernel param handler for nokaslr.
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Link: https://lore.kernel.org/r/20230412043258.397455-1-quic_pkondeti@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently only the first attempt to single-step has any effect. After
that all further stepping remains "stuck" at the same program counter
value.
Refer to the ARM Architecture Reference Manual (ARM DDI 0487E.a) D2.12,
PSTATE.SS=1 should be set at each step before transferring the PE to the
'Active-not-pending' state. The problem here is PSTATE.SS=1 is not set
since the second single-step.
After the first single-step, the PE transferes to the 'Inactive' state,
with PSTATE.SS=0 and MDSCR.SS=1, thus PSTATE.SS won't be set to 1 due to
kernel_active_single_step()=true. Then the PE transferes to the
'Active-pending' state when ERET and returns to the debugger by step
exception.
Before this patch:
==================
Entering kdb (current=0xffff3376039f0000, pid 1) on processor 0 due to Keyboard Entry
[0]kdb>
[0]kdb>
[0]kdb> bp write_sysrq_trigger
Instruction(i) BP #0 at 0xffffa45c13d09290 (write_sysrq_trigger)
is enabled addr at ffffa45c13d09290, hardtype=0 installed=0
[0]kdb> go
$ echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff4f7e453f8000, pid 175) on processor 1 due to Breakpoint @ 0xffffad651a309290
[1]kdb> ss
Entering kdb (current=0xffff4f7e453f8000, pid 175) on processor 1 due to SS trap @ 0xffffad651a309294
[1]kdb> ss
Entering kdb (current=0xffff4f7e453f8000, pid 175) on processor 1 due to SS trap @ 0xffffad651a309294
[1]kdb>
After this patch:
=================
Entering kdb (current=0xffff6851c39f0000, pid 1) on processor 0 due to Keyboard Entry
[0]kdb> bp write_sysrq_trigger
Instruction(i) BP #0 at 0xffffc02d2dd09290 (write_sysrq_trigger)
is enabled addr at ffffc02d2dd09290, hardtype=0 installed=0
[0]kdb> go
$ echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff6851c53c1840, pid 174) on processor 1 due to Breakpoint @ 0xffffc02d2dd09290
[1]kdb> ss
Entering kdb (current=0xffff6851c53c1840, pid 174) on processor 1 due to SS trap @ 0xffffc02d2dd09294
[1]kdb> ss
Entering kdb (current=0xffff6851c53c1840, pid 174) on processor 1 due to SS trap @ 0xffffc02d2dd09298
[1]kdb> ss
Entering kdb (current=0xffff6851c53c1840, pid 174) on processor 1 due to SS trap @ 0xffffc02d2dd0929c
[1]kdb>
Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
Co-developed-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20230202073148.657746-3-sumit.garg@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When TIF_SME is clear, fpsimd_restore_current_state will disable
SME trap during ret_to_user, then SME access trap is impossible
in userspace, not SVE.
Besides, fix typo: alocated->allocated.
Signed-off-by: Dongxu Sun <sundongxu3@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230317124915.1263-5-sundongxu3@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Move tpidr2 sigframe allocation from under the checking of
system_supports_sme() to the checking of system_supports_tpidr2().
Signed-off-by: Dongxu Sun <sundongxu3@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230317124915.1263-3-sundongxu3@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Since commit a9d6915859501("arm64/sme: Implement support
for TPIDR2"), We introduced system_supports_tpidr2() for
TPIDR2 handling. Let's use the specific check instead.
No functional changes.
Signed-off-by: Dongxu Sun <sundongxu3@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230317124915.1263-2-sundongxu3@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
memory zones
In commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for
platforms with no DMA memory zones"), reserve_crashkernel() is called
much earlier in arm64_memblock_init() to avoid causing base apge
mapping on platforms with no DMA meomry zones.
With taking off protection on crashkernel memory region, no need to call
reserve_crashkernel() specially in advance. The deferred invocation of
reserve_crashkernel() in bootmem_init() can cover all cases. So revert
the whole commit now.
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20230407011507.17572-4-bhe@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
After taking off the protection functions on crashkernel memory region,
there's no need to map crashkernel region with page granularity during
linear mapping.
With this change, the system can make use of block or section mapping
on linear region to largely improve perforcemence during system bootup
and running.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20230407011507.17572-3-bhe@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Problem:
=======
On arm64, block and section mapping is supported to build page tables.
However, currently it enforces to take base page mapping for the whole
linear mapping if CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabled and
crashkernel kernel parameter is set. This will cause longer time of the
linear mapping process during bootup and severe performance degradation
during running time.
Root cause:
==========
On arm64, crashkernel reservation relies on knowing the upper limit of
low memory zone because it needs to reserve memory in the zone so that
devices' DMA addressing in kdump kernel can be satisfied. However, the
upper limit of low memory on arm64 is variant. And the upper limit can
only be decided late till bootmem_init() is called [1].
And we need to map the crashkernel region with base page granularity when
doing linear mapping, because kdump needs to protect the crashkernel region
via set_memory_valid(,0) after kdump kernel loading. However, arm64 doesn't
support well on splitting the built block or section mapping due to some
cpu reststriction [2]. And unfortunately, the linear mapping is done before
bootmem_init().
To resolve the above conflict on arm64, the compromise is enforcing to
take base page mapping for the entire linear mapping if crashkernel is
set, and CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabed. Hence
performance is sacrificed.
Solution:
=========
Comparing with the base page mapping for the whole linear region, it's
better to take off the protection on crashkernel memory region for the
time being because the anticipated stamping on crashkernel memory region
could only happen in a chance in one million, while the base page mapping
for the whole linear region is mitigating arm64 systems with crashkernel
set always.
[1]
https://lore.kernel.org/all/YrIIJkhKWSuAqkCx@arm.com/T/#u
[2]
https://lore.kernel.org/linux-arm-kernel/20190911182546.17094-1-nsaenzjulienne@suse.de/T/
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20230407011507.17572-2-bhe@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Some generic COMPAT definitions have been consolidated in
asm-generic/compat.h by commit 84a0c977ab98
("asm-generic: compat: Cleanup duplicate definitions")
Remove those that are already defined to the same value there from
arm64 asm/compat.h.
Signed-off-by: Teo Couprie Diaz <teo.coupriediaz@arm.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230314140038.252908-1-teo.coupriediaz@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Today the fixmap code largely maps elements at PAGE_SIZE granularity,
but we special-case the FDT mapping such that it can be mapped with 2M
block mappings when 4K pages are in use. The original rationale for this
was simplicity, but it has some unfortunate side-effects, and
complicates portions of the fixmap code (i.e. is not so simple after
all).
The FDT can be up to 2M in size but is only required to have 8-byte
alignment, and so it may straddle a 2M boundary. Thus when using 2M
block mappings we may map up to 4M of memory surrounding the FDT. This
is unfortunate as most of that memory will be unrelated to the FDT, and
any pages which happen to share a 2M block with the FDT will by mapped
with Normal Write-Back Cacheable attributes, which might not be what we
want elsewhere (e.g. for carve-outs using Non-Cacheable attributes).
The logic to handle mapping the FDT with 2M blocks requires some special
cases in the fixmap code, and ties it to the early page table
configuration by virtue of the SWAPPER_TABLE_SHIFT and
SWAPPER_BLOCK_SIZE constants used to determine the granularity used to
map the FDT.
This patch simplifies the FDT logic and removes the unnecessary mappings
of surrounding pages by always mapping the FDT at page granularity as
with all other fixmap mappings. To do so we statically reserve multiple
PTE tables to cover the fixmap VA range. Since the FDT can be at most
2M, for 4K pages we only need to allocate a single additional PTE table,
and for 16K and 64K pages the existing single PTE table is sufficient.
The PTE table allocation scales with the number of slots reserved in the
fixmap, and so this also makes it easier to add more fixmap entries if
we require those in future.
Our VA layout means that the fixmap will always fall within a single PMD
table (and consequently, within a single PUD/P4D/PGD entry), which we
can verify at compile time with a static_assert(). With that assert a
number of runtime warnings become impossible, and are removed.
I've boot-tested this patch with both 4K and 64K pages.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20230406152759.4164229-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Over time, arm64's mm/mmu.c has become increasingly large and painful to
navigate. Move the fixmap code to its own file where it can be understood in
isolation.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20230406152759.4164229-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently arm64's FIXADDR_{START,SIZE} definitions only cover the
runtime fixmap slots (and not the boot-time fixmap slots), but the code
for creating the fixmap assumes that these definitions cover the entire
fixmap range. This means that the ptdump boundaries are reported in a
misleading way, missing the VA region of the runtime slots. In theory
this could also cause the fixmap creation to go wrong if the boot-time
fixmap slots end up spilling into a separate PMD entry, though luckily
this is not currently the case in any configuration.
While it seems like we could extend FIXADDR_{START,SIZE} to cover the
entire fixmap area, core code relies upon these *only* covering the
runtime slots. For example, fix_to_virt() and virt_to_fix() try to
reject manipulation of the boot-time slots based upon
FIXADDR_{START,SIZE}, while __fix_to_virt() and __virt_to_fix() can
handle any fixmap slot.
This patch follows the lead of x86 in commit:
55f49fcb879fbeeb ("x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP")
... and add new FIXADDR_TOT_{START,SIZE} definitions which cover the
entire fixmap area, using these for the fixmap creation and ptdump code.
As the boot-time fixmap slots are now rejected by fix_to_virt(),
the early_fixmap_init() code is changed to consistently use
__fix_to_virt(), as it already does in a few cases.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20230406152759.4164229-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Following recent refactorings, the get_ftrace_plt function only ever
gets called with addr = FTRACE_ADDR so its code can be simplified to
always return the ftrace trampoline plt.
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230405180250.2046566-3-revest@chromium.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This builds up on the CALL_OPS work which extends the ftrace patchsite
on arm64 with an ops pointer usable by the ftrace trampoline.
This ops pointer is valid at all time. Indeed, it is either pointing to
ftrace_list_ops or to the single ops which should be called from that
patchsite.
There are a few cases to distinguish:
- If a direct call ops is the only one tracing a function:
- If the direct called trampoline is within the reach of a BL
instruction
-> the ftrace patchsite jumps to the trampoline
- Else
-> the ftrace patchsite jumps to the ftrace_caller trampoline which
reads the ops pointer in the patchsite and jumps to the direct
call address stored in the ops
- Else
-> the ftrace patchsite jumps to the ftrace_caller trampoline and its
ops literal points to ftrace_list_ops so it iterates over all
registered ftrace ops, including the direct call ops and calls its
call_direct_funcs handler which stores the direct called
trampoline's address in the ftrace_regs and the ftrace_caller
trampoline will return to that address instead of returning to the
traced function
Signed-off-by: Florent Revest <revest@chromium.org>
Co-developed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230405180250.2046566-2-revest@chromium.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace into for-next/ftrace
Pull in ftrace trampoline updates from Steve so that we can implement
support for direct calls for arm64 on top:
tracing: Direct trampoline updates
Updates to the direct trampoline to allow ARM64 to have direct
trampolines.
|
|
arch_dma_prep_coherent()""
This reverts commit b7d9aae404841d9999b7476170867ae441e948d2.
With the Qualcomm remoteproc driver now modified to use a carveout
memory region in 57f72170a2b2 ("remoteproc: qcom_q6v5_mss: Use a
carveout to authenticate modem headers"), we can reinstate c44094eee32f
("arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()")
which relaxes the arm64 implementation of arch_dma_prep_coherent() to
perform only a data cache clean operation, rather than a
clean-and-invalidate.
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently the asm constraints for __get_mem_asm() mark the value
register as an earlyclobber operand. This means that the compiler can't
reuse the same register for both the address and value, even when the
value is not subsequently used.
There's no need for the value register to be marked as earlyclobber, as
it's only written to after the address register is consumed, even when
the access faults.
Remove the unnecessary earlyclobber.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230314153700.787701-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently the asm constraints for __put_mem_asm() require that the value
is placed in a "real" GPR (i.e. one other than [XW]ZR or SP). This means
that for cases such as:
__put_user(0, addr)
... the compiler has to move '0' into "real" GPR, e.g.
mov xN, #0
sttr xN, [<addr>]
This is unfortunate, as using the zero register would require fewer
instructions and save a "real" GPR for other usage, allowing the
compiler to generate:
sttr xzr, [<addr>]
Modify the asm constaints for __put_mem_asm() to permit the use of the
zero register for the value.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230314153700.787701-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently the asm constraints for __smp_store_release() require that the
value is placed in a "real" GPR (i.e. one other than [XW]ZR or SP).
This means that for cases such as:
__smp_store_release(ptr, 0)
... the compiler has to move '0' into "real" GPR, e.g.
mov xN, #0
stlr xN, [<addr>]
This is unfortunate, as using the zero register would require fewer
instructions and save a "real" GPR for other usage, allowing the
compiler to generate:
stlr xzr, [<addr>]
Modify the asm constaints for __smp_store_release() to permit the use of
the zero register for the value.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230314153700.787701-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
For historical reasons, the LSE implementation of cmpxchg*() hard-codes
the GPRs to use, and shuffles registers around with MOVs. This is no
longer necessary, and can be simplified.
When the LSE cmpxchg implementation was added in commit:
c342f78217e822d2 ("arm64: cmpxchg: patch in lse instructions when supported by the CPU")
... the LL/SC implementation of cmpxchg() would be placed out-of-line,
and the in-line assembly for cmpxchg would default to:
NOP
BL <ll_sc_cmpxchg*_implementation>
NOP
The LL/SC implementation of each cmpxchg() function accepted arguments
as per AAPCS64 rules, to it was necessary to place the pointer in x0,
the older value in X1, and the new value in x2, and acquire the return
value from x0. The LL/SC implementation required a temporary register
(e.g. for the STXR status value). As the LL/SC implementation preserved
the old value, the LSE implementation does likewise.
Since commit:
addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics")
... the LSE and LL/SC implementations of cmpxchg are inlined as separate
asm blocks, with another branch choosing between thw two. Due to this,
it is no longer necessary for the LSE implementation to match the
register constraints of the LL/SC implementation. This was partially
dealt with by removing the hard-coded use of x30 in commit:
3337cb5aea594e40 ("arm64: avoid using hard-coded registers for LSE atomics")
... but we didn't clean up the hard-coding of x0, x1, and x2.
This patch simplifies the LSE implementation of cmpxchg, removing the
register shuffling and directly clobbering the 'old' argument. This
gives the compiler greater freedom for register allocation, and avoids
redundant work.
The new constraints permit 'old' (Rs) and 'new' (Rt) to be allocated to
the same register when the initial values of the two are the same, e.g.
resulting in:
CAS X0, X0, [X1]
This is safe as Rs is only written back after the initial values of Rs
and Rt are consumed, and there are no UNPREDICTABLE behaviours to avoid
when Rs == Rt.
The new constraints also permit 'new' to be allocated to the zero
register, avoiding a MOV in a few cases. The same cannot be done for
'old' as it is both an input and output, and any caller of cmpxchg()
should care about the output value. Note that for CAS* the use of the
zero register never affects the ordering (while for SWP* the use of the
zero regsiter for the 'old' value drops any ACQUIRE semantic).
Compared to v6.2-rc4, a defconfig vmlinux is ~116KiB smaller, though the
resulting Image is the same size due to internal alignment and padding:
[mark@lakrids:~/src/linux]% ls -al vmlinux-*
-rwxr-xr-x 1 mark mark 137269304 Jan 16 11:59 vmlinux-after
-rwxr-xr-x 1 mark mark 137387936 Jan 16 10:54 vmlinux-before
[mark@lakrids:~/src/linux]% ls -al Image-*
-rw-r--r-- 1 mark mark 38711808 Jan 16 11:59 Image-after
-rw-r--r-- 1 mark mark 38711808 Jan 16 10:54 Image-before
This patch does not touch cmpxchg_double*() as that requires contiguous
register pairs, and separate patches will replace it with cmpxchg128*().
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230314153700.787701-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Pointer variables of void * type do not require type cast.
Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Link: https://lore.kernel.org/r/20230303025715.32570-1-yuzhe@nfschina.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Pointer variables of void * type do not require type cast.
Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Link: https://lore.kernel.org/r/20230303025047.19717-1-yuzhe@nfschina.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Running a preempt-rt (v6.2-rc3-rt1) based kernel on an Ampere Altra
triggers:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 24, name: cpuhp/0
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by cpuhp/0/24:
#0: ffffda30217c70d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248
#1: ffffda30217c7120 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248
#2: ffffda3021c711f0 (sdei_list_lock){....}-{3:3}, at: sdei_cpuhp_up+0x3c/0x130
irq event stamp: 36
hardirqs last enabled at (35): [<ffffda301e85b7bc>] finish_task_switch+0xb4/0x2b0
hardirqs last disabled at (36): [<ffffda301e812fec>] cpuhp_thread_fun+0x21c/0x248
softirqs last enabled at (0): [<ffffda301e80b184>] copy_process+0x63c/0x1ac0
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 0 PID: 24 Comm: cpuhp/0 Not tainted 5.19.0-rc3-rt5-[...]
Hardware name: WIWYNN Mt.Jade Server [...]
Call trace:
dump_backtrace+0x114/0x120
show_stack+0x20/0x70
dump_stack_lvl+0x9c/0xd8
dump_stack+0x18/0x34
__might_resched+0x188/0x228
rt_spin_lock+0x70/0x120
sdei_cpuhp_up+0x3c/0x130
cpuhp_invoke_callback+0x250/0xf08
cpuhp_thread_fun+0x120/0x248
smpboot_thread_fn+0x280/0x320
kthread+0x130/0x140
ret_from_fork+0x10/0x20
sdei_cpuhp_up() is called in the STARTING hotplug section,
which runs with interrupts disabled. Use a CPUHP_AP_ONLINE_DYN entry
instead to execute the cpuhp cb later, with preemption enabled.
SDEI originally got its own cpuhp slot to allow interacting
with perf. It got superseded by pNMI and this early slot is not
relevant anymore. [1]
Some SDEI calls (e.g. SDEI_1_0_FN_SDEI_PE_MASK) take actions on the
calling CPU. It is checked that preemption is disabled for them.
_ONLINE cpuhp cb are executed in the 'per CPU hotplug thread'.
Preemption is enabled in those threads, but their cpumask is limited
to 1 CPU.
Move 'WARN_ON_ONCE(preemptible())' statements so that SDEI cpuhp cb
don't trigger them.
Also add a check for the SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI call
which acts on the calling CPU.
[1]:
https://lore.kernel.org/all/5813b8c5-ae3e-87fd-fccc-94c9cd08816d@arm.com/
Suggested-by: James Morse <james.morse@arm.com>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20230216084920.144064-1-pierre.gondois@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray
callbacks") removed the calls to memcpy_page_flushcache().
Remove the unnecessary memcpy_page_flushcache() call.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Dan Williams" <dan.j.williams@intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221230-kmap-x86-v1-3-15f1ecccab50@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Kfence only needs its pool to be mapped as page granularity, if it is
inited early. Previous judgement was a bit over protected. From [1], Mark
suggested to "just map the KFENCE region a page granularity". So I
decouple it from judgement and do page granularity mapping for kfence
pool only. Need to be noticed that late init of kfence pool still requires
page granularity mapping.
Page granularity mapping in theory cost more(2M per 1GB) memory on arm64
platform. Like what I've tested on QEMU(emulated 1GB RAM) with
gki_defconfig, also turning off rodata protection:
Before:
[root@liebao ]# cat /proc/meminfo
MemTotal: 999484 kB
After:
[root@liebao ]# cat /proc/meminfo
MemTotal: 1001480 kB
To implement this, also relocate the kfence pool allocation before the
linear mapping setting up, arm64_kfence_alloc_pool is to allocate phys
addr, __kfence_pool is to be set after linear mapping set up.
LINK: [1] https://lore.kernel.org/linux-arm-kernel/Y+IsdrvDNILA59UN@FVFF77S0Q05N/
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/1679066974-690-1-git-send-email-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The ftrace selftest code has a trace_direct_tramp() function which it
uses as a direct call trampoline. This happens to work on x86, since the
direct call's return address is in the usual place, and can be returned
to via a RET, but in general the calling convention for direct calls is
different from regular function calls, and requires a trampoline written
in assembly.
On s390, regular function calls place the return address in %r14, and an
ftrace patch-site in an instrumented function places the trampoline's
return address (which is within the instrumented function) in %r0,
preserving the original %r14 value in-place. As a regular C function
will return to the address in %r14, using a C function as the trampoline
results in the trampoline returning to the caller of the instrumented
function, skipping the body of the instrumented function.
Note that the s390 issue is not detcted by the ftrace selftest code, as
the instrumented function is trivial, and returning back into the caller
happens to be equivalent.
On arm64, regular function calls place the return address in x30, and
an ftrace patch-site in an instrumented function saves this into r9
and places the trampoline's return address (within the instrumented
function) in x30. A regular C function will return to the address in
x30, but will not restore x9 into x30. Consequently, using a C function
as the trampoline results in returning to the trampoline's return
address having corrupted x30, such that when the instrumented function
returns, it will return back into itself.
To avoid future issues in this area, remove the trace_direct_tramp()
function, and require that each architecture with direct calls provides
a stub trampoline, named ftrace_stub_direct_tramp. This can be written
to handle the architecture's trampoline calling convention, and in
future could be used elsewhere (e.g. in the ftrace ops sample, to
measure the overhead of direct calls), so we may as well always build it
in.
Link: https://lkml.kernel.org/r/20230321140424.345218-8-revest@chromium.org
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Direct called trampolines can be called in two ways:
- either from the ftrace callsite. In this case, they do not access any
struct ftrace_regs nor pt_regs
- Or, if a ftrace ops is also attached, from the end of a ftrace
trampoline. In this case, the call_direct_funcs ops is in charge of
setting the direct call trampoline's address in a struct ftrace_regs
Since:
commit 9705bc709604 ("ftrace: pass fregs to arch_ftrace_set_direct_caller()")
The later case no longer requires a full pt_regs. It only needs a struct
ftrace_regs so DIRECT_CALLS can work with both WITH_ARGS or WITH_REGS.
With architectures like arm64 already abandoning WITH_REGS in favor of
WITH_ARGS, it's important to have DIRECT_CALLS work WITH_ARGS only.
Link: https://lkml.kernel.org/r/20230321140424.345218-7-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Co-developed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
All direct calls are now registered using the register_ftrace_direct API
so each ops can jump to only one direct-called trampoline.
By storing the direct called trampoline address directly in the ops we
can save one hashmap lookup in the direct call ops and implement arm64
direct calls on top of call ops.
Link: https://lkml.kernel.org/r/20230321140424.345218-6-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Now that the original _ftrace_direct APIs are gone, the "_multi"
suffixes only add confusion.
Link: https://lkml.kernel.org/r/20230321140424.345218-5-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
This API relies on a single global ops, used for all direct calls
registered with it. However, to implement arm64 direct calls, we need
each ops to point to a single direct call trampoline.
Link: https://lkml.kernel.org/r/20230321140424.345218-4-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The _multi API requires that users keep their own ops but can enforce
that an op is only associated to one direct call.
Link: https://lkml.kernel.org/r/20230321140424.345218-3-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
A common pattern when using the ftrace_direct_multi API is to unregister
the ops and also immediately free its filter. We've noticed it's very
easy for users to miss calling ftrace_free_filter().
This adds a "free_filters" argument to unregister_ftrace_direct_multi()
to both remind the user they should free filters and also to make their
life easier.
Link: https://lkml.kernel.org/r/20230321140424.345218-2-revest@chromium.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix setting affinity of hwlat threads in containers
Using sched_set_affinity() has unwanted side effects when being
called within a container. Use set_cpus_allowed_ptr() instead
- Fix per cpu thread management of the hwlat tracer:
- Do not start per_cpu threads if one is already running for the CPU
- When starting per_cpu threads, do not clear the kthread variable
as it may already be set to running per cpu threads
- Fix return value for test_gen_kprobe_cmd()
On error the return value was overwritten by being set to the result
of the call from kprobe_event_delete(), which would likely succeed,
and thus have the function return success
- Fix splice() reads from the trace file that was broken by commit
36e2c7421f02 ("fs: don't allow splice read/write without explicit
ops")
- Remove obsolete and confusing comment in ring_buffer.c
The original design of the ring buffer used struct page flags for
tricks to optimize, which was shortly removed due to them being
tricks. But a comment for those tricks remained
- Set local functions and variables to static
* tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr
ring-buffer: remove obsolete comment for free_buffer_page()
tracing: Make splice_read available again
ftrace: Set direct_ops storage-class-specifier to static
trace/hwlat: Do not start per-cpu thread if it is already running
trace/hwlat: Do not wipe the contents of per-cpu thread data
tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static
tracing: Fix wrong return in kprobe_event_gen_test.c
|
|
There is a problem with the behavior of hwlat in a container,
resulting in incorrect output. A warning message is generated:
"cpumask changed while in round-robin mode, switching to mode none",
and the tracing_cpumask is ignored. This issue arises because
the kernel thread, hwlatd, is not a part of the container, and
the function sched_setaffinity is unable to locate it using its PID.
Additionally, the task_struct of hwlatd is already known.
Ultimately, the function set_cpus_allowed_ptr achieves
the same outcome as sched_setaffinity, but employs task_struct
instead of PID.
Test case:
# cd /sys/kernel/tracing
# echo 0 > tracing_on
# echo round-robin > hwlat_detector/mode
# echo hwlat > current_tracer
# unshare --fork --pid bash -c 'echo 1 > tracing_on'
# dmesg -c
Actual behavior:
[573502.809060] hwlat_detector: cpumask changed while in round-robin mode, switching to mode none
Link: https://lore.kernel.org/linux-trace-kernel/20230316144535.1004952-1-costa.shul@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 0330f7aa8ee63 ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs")
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The comment refers to mm/slob.c which is being removed. It comes from
commit ed56829cb319 ("ring_buffer: reset buffer page when freeing") and
according to Steven the borrowed code was a page mapcount and mapping
reset, which was later removed by commit e4c2ce82ca27 ("ring_buffer:
allocate buffer page pointer"). Thus the comment is not accurate anyway,
remove it.
Link: https://lore.kernel.org/linux-trace-kernel/20230315142446.27040-1-vbabka@suse.cz
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Reported-by: Mike Rapoport <mike.rapoport@gmail.com>
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Fixes: e4c2ce82ca27 ("ring_buffer: allocate buffer page pointer")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since the commit 36e2c7421f02 ("fs: don't allow splice read/write
without explicit ops") is applied to the kernel, splice() and
sendfile() calls on the trace file (/sys/kernel/debug/tracing
/trace) return EINVAL.
This patch restores these system calls by initializing splice_read
in file_operations of the trace file. This patch only enables such
functionalities for the read case.
Link: https://lore.kernel.org/linux-trace-kernel/20230314013707.28814-1-sfoon.kim@samsung.com
Cc: stable@vger.kernel.org
Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: Sung-hun Kim <sfoon.kim@samsung.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|