Age | Commit message (Collapse) | Author |
|
LLD will occasionally error out with a '__init_end does not converge'
error if INIT_IDMAP_DIR_SIZE is defined in terms of _end, as this
results in a circular dependency.
Counter this by dimensioning the initial IDMAP page tables based on a
new boundary marker 'kimage_limit', and define it such that its value
should not change as a result of the initdata segment being pushed over
a 64k segment boundary due to changes in INIT_IDMAP_DIR_SIZE, provided
that its value doesn't change by more than 2M between linker passes.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250531123005.3866382-2-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
It turns out [1] that the way LLD handles ASSERT()s in the linker script
can result in spurious failures, so disable them for the newly
introduced BSS symbol export checks. Since we're not aware of any issues
with the existing assertions in vmlinux.lds.S, leave those alone for now
so that they can continue to provide useful coverage.
A linker fix [2] is due to land in version 21 of LLD.
Link: https://lore.kernel.org/r/202505261019.OUlitN6m-lkp@intel.com [1]
Link: https://github.com/llvm/llvm-project/commit/5859863bab7f [2]
Link: https://github.com/ClangBuiltLinux/linux/issues/2094
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202505261019.OUlitN6m-lkp@intel.com/
Link: https://lore.kernel.org/r/20250529073507.2984959-2-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
* for-next/vdso:
arm64: vdso: Use __arch_counter_get_cntvct()
|
|
* for-next/sme-fixes: (35 commits)
arm64/fpsimd: Allow CONFIG_ARM64_SME to be selected
arm64/fpsimd: ptrace: Gracefully handle errors
arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state
arm64/fpsimd: ptrace: Do not present register data for inactive mode
arm64/fpsimd: ptrace: Save task state before generating SVE header
arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid state
arm64/fpsimd: ptrace/prctl: Ensure VL changes do not resurrect stale data
arm64/fpsimd: Make clone() compatible with ZA lazy saving
arm64/fpsimd: Clear PSTATE.SM during clone()
arm64/fpsimd: Consistently preserve FPSIMD state during clone()
arm64/fpsimd: Remove redundant task->mm check
arm64/fpsimd: signal: Use SMSTOP behaviour in setup_return()
arm64/fpsimd: Add task_smstop_sm()
arm64/fpsimd: Factor out {sve,sme}_state_size() helpers
arm64/fpsimd: Clarify sve_sync_*() functions
arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE
arm64/fpsimd: signal: Consistently read FPSIMD context
arm64/fpsimd: signal: Mandate SVE payload for streaming-mode state
arm64/fpsimd: signal: Clear PSTATE.SM when restoring FPSIMD frame only
arm64/fpsimd: Do not discard modified SVE state
...
|
|
* for-next/selftests:
kselftest/arm64: Set default OUTPUT path when undefined
kselftest/arm64: fp-ptrace: Adjust to new inactive mode behaviour
kselftest/arm64: fp-ptrace: Adjust to new VL change behaviour
kselftest/arm64: tpidr2: Adjust to new clone() behaviour
kselftest/arm64: fp-ptrace: Fix expected FPMR value when PSTATE.SM is changed
|
|
* for-next/psci:
firmware: psci: Fix refcount leak in psci_dt_init
|
|
* for-next/perf:
perf/arm-cmn: Add CMN S3 ACPI binding
perf/arm-cmn: Initialise cmn->cpu earlier
perf/amlogic: Replace smp_processor_id() with raw_smp_processor_id() in meson_ddr_pmu_create()
perf/arm-cmn: Fix REQ2/SNP2 mixup
perf: Do not enable by default during compile testing
perf: arm-ni: Fix missing platform_set_drvdata()
perf: arm-ni: Unregister PMUs on probe failure
perf/arm-cmn: Remove CMN-600 DTC domain special case
|
|
* for-next/mm:
arm64/boot: Disallow BSS exports to startup code
arm64/boot: Move global CPU override variables out of BSS
arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace
arm64: mm: Drop redundant check in pmd_trans_huge()
arm64/mm: Permit lazy_mmu_mode to be nested
arm64/mm: Disable barrier batching in interrupt contexts
arm64/mm: Batch barriers when updating kernel mappings
mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes
arm64/mm: Support huge pte-mapped pages in vmap
mm/vmalloc: Gracefully unmap huge ptes
mm/vmalloc: Warn on improper use of vunmap_range()
arm64/mm: Hoist barriers out of set_ptes_anysz() loop
arm64: hugetlb: Use __set_ptes_anysz() and __ptep_get_and_clear_anysz()
arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear()
mm/page_table_check: Batch-check pmds/puds just like ptes
arm64: hugetlb: Refine tlb maintenance scope
arm64: hugetlb: Cleanup huge_pte size discovery mechanisms
arm64: pageattr: Explicitly bail out when changing permissions for vmalloc_huge mappings
arm64: Support ARM64_VA_BITS=52 when setting ARCH_MMAP_RND_BITS_MAX
arm64/mm: Remove randomization of the linear map
|
|
* for-next/misc:
arm64/cpuinfo: only show one cpu's info in c_show()
arm64: Extend pr_crit message on invalid FDT
arm64: Kconfig: remove unnecessary selection of CRC32
arm64: Add missing includes for mem_encrypt
|
|
Merge in for-next/fixes, as subsequent improvements to our early PI
code that disallow BSS exports depend on the 'arm64_use_ng_mappings'
fix here.
* for-next/fixes:
arm64: cpufeature: Move arm64_use_ng_mappings to the .data section to prevent wrong idmap generation
arm64: errata: Add missing sentinels to Spectre-BHB MIDR arrays
|
|
* for-next/entry:
arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again
arm64: Update comment regarding values in __boot_cpu_mode
arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1
arm64: enable PREEMPT_LAZY
|
|
* for-next/efi:
arm64/fpsimd: Avoid unnecessary per-CPU buffers for EFI runtime calls
|
|
* for-next/cpufeature:
arm64: cputype: Add cputype definition for HIP12
arm64: Expose AIDR_EL1 via sysfs
arm64/cpufeature: Add missing id_aa64mmfr4 feature reg update
|
|
* for-next/acpi:
firmware: SDEI: Allow sdei initialization without ACPI_APEI_GHES
|
|
Add MIDR encoding for HiSilicon HIP12 which is used on HiSilicon
HIP12 SoCs.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250425033845.57671-2-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit 5b39db6037e7 ("arm64: el2_setup.h: Rename some labels to be more
diff-friendly") reworked the labels in __init_el2_fgt to say what's
skipped rather than what the target location is. The exception was
"set_fgt_" which is where registers are written. In reviewing the BRBE
additions, Will suggested "set_debug_fgt_" where HDFGxTR_EL2 are
written. Doing that would partially revert commit 5b39db6037e7 undoing
the goal of minimizing additions here, but it would follow the
convention for labels where registers are written.
So let's do both. Branches that skip something go to a "skip" label and
places that set registers have a "set" label. This results in some
double labels, but it makes things entirely consistent.
While we're here, the SME skip label was incorrectly named, so fix it.
Reported-by: Will Deacon <will@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20250520-arm-brbe-v19-v22-2-c1ddde38e7f8@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
An ACPI binding for CMN S3 was not yet finalised when the driver support
was originally written, but v1.2 of DEN0093 "ACPI for Arm Components"
has at last been published; support ACPI systems using the proper HID.
Cc: stable@vger.kernel.org
Fixes: 0dc2f4963f7e ("perf/arm-cmn: Support CMN S3")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/7dafe147f186423020af49d7037552ee59c60e97.1747652164.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
BSS might be uninitialized when entering the startup code, so forbid the
use by the startup code of any variables that live after __bss_start in
the linker map.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Link: https://lore.kernel.org/r/20250508114328.2460610-8-ardb+git@google.com
[will: Drop export of 'memstart_offset_seed', as this has been removed]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Accessing BSS will no longer be permitted from the startup code in
arch/arm64/kernel/pi, as some of it executes before BSS is cleared.
Clearing BSS earlier would involve managing cache coherency explicitly
in software, which is a hassle we prefer to avoid.
So move some variables that are assigned by the startup code out of BSS
and into .data.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Link: https://lore.kernel.org/r/20250508114328.2460610-7-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
init_pgdir[] is only referenced from the startup code, but lives after
BSS in the linker map. Before tightening the rules about accessing BSS
from startup code, move init_pgdir[] into the __pi_ namespace, so it
does not need to be exported explicitly.
For symmetry, do the same with init_idmap_pgdir[], although it lives
before BSS.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Link: https://lore.kernel.org/r/20250508114328.2460610-6-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
For all the complexity of handling affinity for CPU hotplug, what we've
apparently managed to overlook is that arm_cmn_init_irqs() has in fact
always been setting the *initial* affinity of all IRQs to CPU 0, not the
CPU we subsequently choose for event scheduling. Oh dear.
Cc: stable@vger.kernel.org
Fixes: 0ba64770a2f2 ("perf: Add Arm CMN-600 PMU driver")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/b12fccba6b5b4d2674944f59e4daad91cd63420b.1747069914.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When running 'make' in tools/testing/selftests/arm64/ without explicitly
setting the OUTPUT variable, the build system will creates test
directories (e.g., /bti) in the root filesystem due to OUTPUT defaulting
to an empty string. This causes unintended pollution of the root directory.
This patch adds proper handling for the OUTPUT variable: Sets OUTPUT
to the current directory (.) if not specified
Signed-off-by: tanze <tanze@kylinos.cn>
Link: https://lore.kernel.org/r/20250515051839.3409658-1-tanze@kylinos.cn
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The values stored in __boot_cpu_mode were changed without updating the
comment. Rectify that.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20250513124525.677736-1-ben.horgan@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
pmd_val(pmd) is redundant because a positive pmd_present(pmd) ensures
a positive pmd_val(pmd) according to their definitions like below.
#define pmd_val(x) ((x).pmd)
#define pmd_present(pmd) pte_present(pmd_pte(pmd))
#define pte_present(pte) (pte_valid(pte) || pte_present_invalid(pte))
#define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID))
#define pte_present_invalid(pte) \
((pte_val(pte) & (PTE_VALID | PTE_PRESENT_INVALID)) == PTE_PRESENT_INVALID)
pte_present() can't be positive unless either of the flag PTE_VALID or
PTE_PRESENT_INVALID is set. In this case, pmd_val(pmd) should be positive
either.
So lets drop the redundant check pmd_val(pmd) and no functional changes
intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20250508085251.204282-1-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
mov_q cannot really move PIE_E[0|1] macros into a general purpose register
as expected if those macro constants contain some 128 bit layout elements,
that are required for D128 page tables. The primary issue is that for D128,
PIE_E[0|1] are defined in terms of 128-bit types with shifting and masking,
which the assembler can't accommodate.
Instead pre-calculate these PIRE0_EL1/PIR_EL1 constants into asm-offsets.h
based PIE_E0_ASM/PIE_E1_ASM which can then be used in arch/arm64/mm/proc.S.
While here also drop PTE_MAYBE_NG/PTE_MAYBE_SHARED assembly overrides which
are not required any longer, as the compiler toolchains are smart enough to
compute both the PIE_[E0|E1]_ASM constants in all scenarios.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20250429050511.1663235-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
lazy_mmu_mode is not supposed to permit nesting. But in practice this
does happen with CONFIG_DEBUG_PAGEALLOC, where a page allocation inside
a lazy_mmu_mode section (such as zap_pte_range()) will change
permissions on the linear map with apply_to_page_range(), which
re-enters lazy_mmu_mode (see stack trace below).
The warning checking that nesting was not happening was previously being
triggered due to this. So let's relax by removing the warning and
tolerate nesting in the arm64 implementation. The first (inner) call to
arch_leave_lazy_mmu_mode() will flush and clear the flag such that the
remainder of the work in the outer nest behaves as if outside of lazy
mmu mode. This is safe and keeps tracking simple.
Code review suggests powerpc deals with this issue in the same way.
------------[ cut here ]------------
WARNING: CPU: 6 PID: 1 at arch/arm64/include/asm/pgtable.h:89 __apply_to_page_range+0x85c/0x9f8
Modules linked in: ip_tables x_tables ipv6
CPU: 6 UID: 0 PID: 1 Comm: systemd Not tainted 6.15.0-rc5-00075-g676795fe9cf6 #1 PREEMPT
Hardware name: QEMU KVM Virtual Machine, BIOS 2024.08-4 10/25/2024
pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __apply_to_page_range+0x85c/0x9f8
lr : __apply_to_page_range+0x2b4/0x9f8
sp : ffff80008009b3c0
x29: ffff80008009b460 x28: ffff0000c43a3000 x27: ffff0001ff62b108
x26: ffff0000c43a4000 x25: 0000000000000001 x24: 0010000000000001
x23: ffffbf24c9c209c0 x22: ffff80008009b4d0 x21: ffffbf24c74a3b20
x20: ffff0000c43a3000 x19: ffff0001ff609d18 x18: 0000000000000001
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000003
x14: 0000000000000028 x13: ffffbf24c97c1000 x12: ffff0000c43a3fff
x11: ffffbf24cacc9a70 x10: ffff0000c43a3fff x9 : ffff0001fffff018
x8 : 0000000000000012 x7 : ffff0000c43a4000 x6 : ffff0000c43a4000
x5 : ffffbf24c9c209c0 x4 : ffff0000c43a3fff x3 : ffff0001ff609000
x2 : 0000000000000d18 x1 : ffff0000c03e8000 x0 : 0000000080000000
Call trace:
__apply_to_page_range+0x85c/0x9f8 (P)
apply_to_page_range+0x14/0x20
set_memory_valid+0x5c/0xd8
__kernel_map_pages+0x84/0xc0
get_page_from_freelist+0x1110/0x1340
__alloc_frozen_pages_noprof+0x114/0x1178
alloc_pages_mpol+0xb8/0x1d0
alloc_frozen_pages_noprof+0x48/0xc0
alloc_pages_noprof+0x10/0x60
get_free_pages_noprof+0x14/0x90
__tlb_remove_folio_pages_size.isra.0+0xe4/0x140
__tlb_remove_folio_pages+0x10/0x20
unmap_page_range+0xa1c/0x14c0
unmap_single_vma.isra.0+0x48/0x90
unmap_vmas+0xe0/0x200
vms_clear_ptes+0xf4/0x140
vms_complete_munmap_vmas+0x7c/0x208
do_vmi_align_munmap+0x180/0x1a8
do_vmi_munmap+0xac/0x188
__vm_munmap+0xe0/0x1e0
__arm64_sys_munmap+0x20/0x38
invoke_syscall+0x48/0x104
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x4c/0x16c
el0t_64_sync_handler+0x10c/0x140
el0t_64_sync+0x198/0x19c
irq event stamp: 281312
hardirqs last enabled at (281311): [<ffffbf24c780fd04>] bad_range+0x164/0x1c0
hardirqs last disabled at (281312): [<ffffbf24c89c4550>] el1_dbg+0x24/0x98
softirqs last enabled at (281054): [<ffffbf24c752d99c>] handle_softirqs+0x4cc/0x518
softirqs last disabled at (281019): [<ffffbf24c7450694>] __do_softirq+0x14/0x20
---[ end trace 0000000000000000 ]---
Fixes: 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings")
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Closes: https://lore.kernel.org/linux-arm-kernel/aCH0TLRQslXHin5Q@arm.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250512150333.5589-1-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
flags in order to defer barriers until exiting the mode. At the same
time, it added warnings to check that pte manipulations were never
performed in interrupt context, because the tracking implementation
could not deal with nesting.
But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC)
do manipulate ptes in softirq context, which triggered the warnings.
So let's take the simplest and safest route and disable the batching
optimization in interrupt contexts. This makes these users no worse off
than prior to the optimization. Additionally the known offenders are
debug features that only manipulate a single PTE, so there is no
performance gain anyway.
There may be some obscure case of encrypted/decrypted DMA with the
dma_free_coherent called from an interrupt context, but again, this is
no worse off than prior to the commit.
Some options for supporting nesting were considered, but there is a
difficult to solve problem if any code manipulates ptes within interrupt
context but *outside of* a lazy mmu region. If this case exists, the
code would expect the updates to be immediate, but because the task
context may have already been in lazy mmu mode, the updates would be
deferred, which could cause incorrect behaviour. This problem is avoided
by always ensuring updates within interrupt context are immediate.
Fixes: 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings")
Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-arm-kernel/681f2a09.050a0220.f2294.0006.GAE@google.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250512102242.4156463-1-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently, when ARM64 displays CPU information, every call to c_show()
assembles all CPU information. However, as the number of CPUs increases,
this can lead to insufficient buffer space due to excessive assembly in
a single call, causing repeated expansion and multiple calls to c_show().
To prevent this invalid c_show() call, only one CPU's information is
assembled each time c_show() is called.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20250421062947.4072855-1-yebin@huaweicloud.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Because the kernel can't tolerate page faults for kernel mappings, when
setting a valid, kernel space pte (or pmd/pud/p4d/pgd), it emits a
dsb(ishst) to ensure that the store to the pgtable is observed by the
table walker immediately. Additionally it emits an isb() to ensure that
any already speculatively determined invalid mapping fault gets
canceled.
We can improve the performance of vmalloc operations by batching these
barriers until the end of a set of entry updates.
arch_enter_lazy_mmu_mode() and arch_leave_lazy_mmu_mode() provide the
required hooks.
vmalloc improves by up to 30% as a result.
Two new TIF_ flags are created; TIF_LAZY_MMU tells us if the task is in
the lazy mode and can therefore defer any barriers until exit from the
lazy mode. TIF_LAZY_MMU_PENDING is used to remember if any pte operation
was performed while in the lazy mode that required barriers. Then when
leaving lazy mode, if that flag is set, we emit the barriers.
Since arch_enter_lazy_mmu_mode() and arch_leave_lazy_mmu_mode() are used
for both user and kernel mappings, we need the second flag to avoid
emitting barriers unnecessarily if only user mappings were updated.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-12-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Wrap vmalloc's pte table manipulation loops with
arch_enter_lazy_mmu_mode() / arch_leave_lazy_mmu_mode(). This provides
the arch code with the opportunity to optimize the pte manipulations.
Note that vmap_pfn() already uses lazy mmu mode since it delegates to
apply_to_page_range() which enters lazy mmu mode for both user and
kernel mappings.
These hooks will shortly be used by arm64 to improve vmalloc
performance.
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-11-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Implement the required arch functions to enable use of contpte in the
vmap when VM_ALLOW_HUGE_VMAP is specified. This speeds up vmap
operations due to only having to issue a DSB and ISB per contpte block
instead of per pte. But it also means that the TLB pressure reduces due
to only needing a single TLB entry for the whole contpte block.
Since vmap uses set_huge_pte_at() to set the contpte, that API is now
used for kernel mappings for the first time. Although in the vmap case
we never expect it to be called to modify a valid mapping so
clear_flush() should never be called, it's still wise to make it robust
for the kernel case, so amend the tlb flush function if the mm is for
kernel space.
Tested with vmalloc performance selftests:
# kself/mm/test_vmalloc.sh \
run_test_mask=1
test_repeat_count=5
nr_pages=256
test_loop_count=100000
use_huge=1
Duration reduced from 1274243 usec to 1083553 usec on Apple M2 for 15%
reduction in time taken.
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-10-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit f7ee1f13d606 ("mm/vmalloc: enable mapping of huge pages at pte
level in vmap") added its support by reusing the set_huge_pte_at() API,
which is otherwise only used for user mappings. But when unmapping those
huge ptes, it continued to call ptep_get_and_clear(), which is a
layering violation. To date, the only arch to implement this support is
powerpc and it all happens to work ok for it.
But arm64's implementation of ptep_get_and_clear() can not be safely
used to clear a previous set_huge_pte_at(). So let's introduce a new
arch opt-in function, arch_vmap_pte_range_unmap_size(), which can
provide the size of a (present) pte. Then we can call
huge_ptep_get_and_clear() to tear it down properly.
Note that if vunmap_range() is called with a range that starts in the
middle of a huge pte-mapped page, we must unmap the entire huge page so
the behaviour is consistent with pmd and pud block mappings. In this
case emit a warning just like we do for pmd/pud mappings.
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-9-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
A call to vmalloc_huge() may cause memory blocks to be mapped at pmd or
pud level. But it is possible to subsequently call vunmap_range() on a
sub-range of the mapped memory, which partially overlaps a pmd or pud.
In this case, vmalloc unmaps the entire pmd or pud so that the
no-overlapping portion is also unmapped. Clearly that would have a bad
outcome, but it's not something that any callers do today as far as I
can tell. So I guess it's just expected that callers will not do this.
However, it would be useful to know if this happened in future; let's
add a warning to cover the eventuality.
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-8-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
set_ptes_anysz() previously called __set_pte() for each PTE in the
range, which would conditionally issue a DSB and ISB to make the new PTE
value immediately visible to the table walker if the new PTE was valid
and for kernel space.
We can do better than this; let's hoist those barriers out of the loop
so that they are only issued once at the end of the loop. We then reduce
the cost by the number of PTEs in the range.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-7-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Refactor the huge_pte helpers to use the new common __set_ptes_anysz()
and __ptep_get_and_clear_anysz() APIs.
This provides 2 benefits; First, when page_table_check=on, hugetlb is
now properly/fully checked. Previously only the first page of a hugetlb
folio was checked. Second, instead of having to call __set_ptes(nr=1)
for each pte in a loop, the whole contiguous batch can now be set in one
go, which enables some efficiencies and cleans up the code.
One detail to note is that huge_ptep_clear_flush() was previously
calling ptep_clear_flush() for a non-contiguous pte (i.e. a pud or pmd
block mapping). This has a couple of disadvantages; first
ptep_clear_flush() calls ptep_get_and_clear() which transparently
handles contpte. Given we only call for non-contiguous ptes, it would be
safe, but a waste of effort. It's preferable to go straight to the layer
below. However, more problematic is that ptep_get_and_clear() is for
PAGE_SIZE entries so it calls page_table_check_pte_clear() and would not
clear the whole hugetlb folio. So let's stop special-casing the non-cont
case and just rely on get_clear_contig_flush() to do the right thing for
non-cont entries.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-6-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Refactor __set_ptes(), set_pmd_at() and set_pud_at() so that they are
all a thin wrapper around a new common __set_ptes_anysz(), which takes
pgsize parameter. Additionally, refactor __ptep_get_and_clear() and
pmdp_huge_get_and_clear() to use a new common
__ptep_get_and_clear_anysz() which also takes a pgsize parameter.
These changes will permit the huge_pte API to efficiently batch-set
pgtable entries and take advantage of the future barrier optimizations.
Additionally since the new *_anysz() helpers call the correct
page_table_check_*_set() API based on pgsize, this means that huge_ptes
will be able to get proper coverage. Currently the huge_pte API always
uses the pte API which assumes an entry only covers a single page.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-5-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Convert page_table_check_p[mu]d_set(...) to
page_table_check_p[mu]ds_set(..., nr) to allow checking a contiguous set
of pmds/puds in single batch. We retain page_table_check_p[mu]d_set(...)
as macros that call new batch functions with nr=1 for compatibility.
arm64 is about to reorganise its pte/pmd/pud helpers to reuse more code
and to allow the implementation for huge_pte to more efficiently set
ptes/pmds/puds in batches. We need these batch-helpers to make the
refactoring possible.
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-4-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When operating on contiguous blocks of ptes (or pmds) for some hugetlb
sizes, we must honour break-before-make requirements and clear down the
block to invalid state in the pgtable then invalidate the relevant tlb
entries before making the pgtable entries valid again.
However, the tlb maintenance is currently always done assuming the worst
case stride (PAGE_SIZE), last_level (false) and tlb_level
(TLBI_TTL_UNKNOWN). We can do much better with the hinting; In reality,
we know the stride from the huge_pte pgsize, we are always operating
only on the last level, and we always know the tlb_level, again based on
pgsize. So let's start providing these hints.
Additionally, avoid tlb maintenace in set_huge_pte_at().
Break-before-make is only required if we are transitioning the
contiguous pte block from valid -> valid. So let's elide the
clear-and-flush ("break") if the pte range was previously invalid.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-3-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Not all huge_pte helper APIs explicitly provide the size of the
huge_pte. So the helpers have to depend on various methods to determine
the size of the huge_pte. Some of these methods are dubious.
Let's clean up the code to use preferred methods and retire the dubious
ones. The options in order of preference:
- If size is provided as parameter, use it together with
num_contig_ptes(). This is explicit and works for both present and
non-present ptes.
- If vma is provided as a parameter, retrieve size via
huge_page_size(hstate_vma(vma)) and use it together with
num_contig_ptes(). This is explicit and works for both present and
non-present ptes.
- If the pte is present and contiguous, use find_num_contig() to walk
the pgtable to find the level and infer the number of ptes from
level. Only works for *present* ptes.
- If the pte is present and not contiguous and you can infer from this
that only 1 pte needs to be operated on. This is ok if you don't care
about the absolute size, and just want to know the number of ptes.
- NEVER rely on resolving the PFN of a present pte to a folio and
getting the folio's size. This is fragile at best, because there is
nothing to stop the core-mm from allocating a folio twice as big as
the huge_pte then mapping it across 2 consecutive huge_ptes. Or just
partially mapping it.
Where we require that the pte is present, add warnings if not-present.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-2-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
meson_ddr_pmu_create()
The Amlogic DDR PMU driver meson_ddr_pmu_create() function incorrectly uses
smp_processor_id(), which assumes disabled preemption. This leads to kernel
warnings during module loading because meson_ddr_pmu_create() can be called
in a preemptible context.
Following kernel warning and stack trace:
[ 31.745138] [ T2289] BUG: using smp_processor_id() in preemptible [00000000] code: (udev-worker)/2289
[ 31.745154] [ T2289] caller is debug_smp_processor_id+0x28/0x38
[ 31.745172] [ T2289] CPU: 4 UID: 0 PID: 2289 Comm: (udev-worker) Tainted: GW 6.14.0-0-MANJARO-ARM #1 59519addcbca6ba8de735e151fd7b9e97aac7ff0
[ 31.745181] [ T2289] Tainted: [W]=WARN
[ 31.745183] [ T2289] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ 31.745188] [ T2289] Call trace:
[ 31.745191] [ T2289] show_stack+0x28/0x40 (C)
[ 31.745199] [ T2289] dump_stack_lvl+0x4c/0x198
[ 31.745205] [ T2289] dump_stack+0x20/0x50
[ 31.745209] [ T2289] check_preemption_disabled+0xec/0xf0
[ 31.745213] [ T2289] debug_smp_processor_id+0x28/0x38
[ 31.745216] [ T2289] meson_ddr_pmu_create+0x200/0x560 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd]
[ 31.745237] [ T2289] g12_ddr_pmu_probe+0x20/0x38 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd]
[ 31.745246] [ T2289] platform_probe+0x98/0xe0
[ 31.745254] [ T2289] really_probe+0x144/0x3f8
[ 31.745258] [ T2289] __driver_probe_device+0xb8/0x180
[ 31.745261] [ T2289] driver_probe_device+0x54/0x268
[ 31.745264] [ T2289] __driver_attach+0x11c/0x288
[ 31.745267] [ T2289] bus_for_each_dev+0xfc/0x160
[ 31.745274] [ T2289] driver_attach+0x34/0x50
[ 31.745277] [ T2289] bus_add_driver+0x160/0x2b0
[ 31.745281] [ T2289] driver_register+0x78/0x120
[ 31.745285] [ T2289] __platform_driver_register+0x30/0x48
[ 31.745288] [ T2289] init_module+0x30/0xfe0 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd]
[ 31.745298] [ T2289] do_one_initcall+0x11c/0x438
[ 31.745303] [ T2289] do_init_module+0x68/0x228
[ 31.745311] [ T2289] load_module+0x118c/0x13a8
[ 31.745315] [ T2289] __arm64_sys_finit_module+0x274/0x390
[ 31.745320] [ T2289] invoke_syscall+0x74/0x108
[ 31.745326] [ T2289] el0_svc_common+0x90/0xf8
[ 31.745330] [ T2289] do_el0_svc+0x2c/0x48
[ 31.745333] [ T2289] el0_svc+0x60/0x150
[ 31.745337] [ T2289] el0t_64_sync_handler+0x80/0x118
[ 31.745341] [ T2289] el0t_64_sync+0x1b8/0x1c0
Changes replaces smp_processor_id() with raw_smp_processor_id() to
ensure safe CPU ID retrieval in preemptible contexts.
Cc: Jiucheng Xu <jiucheng.xu@amlogic.com>
Fixes: 2016e2113d35 ("perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver")
Signed-off-by: Anand Moon <linux.amoon@gmail.com>
Link: https://lore.kernel.org/r/20250407063206.5211-1-linux.amoon@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Somehow the encodings for REQ2/SNP2 channels in XP events
got mixed up... Unmix them.
CC: stable@vger.kernel.org
Fixes: 23760a014417 ("perf/arm-cmn: Add CMN-700 support")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/087023e9737ac93d7ec7a841da904758c254cb01.1746717400.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In order to fix an ABI problem, we recently changed the way that reads
of the NT_ARM_SVE and NT_ARM_SSVE regsets behave when their
corresponding vector state is inactive.
Update the fp-ptrace test for the new behaviour.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Spickett <david.spickett@arm.com>
Cc: Luis Machado <luis.machado@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250508132644.1395904-25-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In order to fix an ABI problem, we recently changed the way that
changing the SVE/SME vector length affects PSTATE.SM. Historically,
changing the SME vector length would clear PSTATE.SM. Now, changing the
SME vector length preserves PSTATE.SM.
Update the fp-ptrace test for the new behaviour.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Spickett <david.spickett@arm.com>
Cc: Luis Machado <luis.machado@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250508132644.1395904-24-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In order to fix an ABI problem, we recently changed the way that a
clone() syscall manipulates TPIDR2 and PSTATE.ZA. Historically the child
would inherit the parent's TPIDR2 value unless CLONE_SETTLS was set, and
now the child will inherit the parent's TPIDR2 value unless CLONE_VM is
set.
Update the tpidr2 test for the new behaviour.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Richard Sandiford <richard.sandiford@arm.com>
Cc: Sander De Smalen <sander.desmalen@arm.com>
Cc: Tamas Petz <tamas.petz@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yury Khrustalev <yury.khrustalev@arm.com>
Link: https://lore.kernel.org/r/20250508132644.1395904-23-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The fp-ptrace test suite expects that FPMR is set to zero when PSTATE.SM
is changed via ptrace, but ptrace has never altered FPMR in this way,
and the test logic erroneously relies upon (and has concealed) a bug
where task_fpsimd_load() would unexpectedly and non-deterministically
clobber FPMR.
Using ptrace, FPMR can only be altered by writing to the NT_ARM_FPMR
regset. The value of PSTATE.SM can be altered by writing to the
NT_ARM_SVE or NT_ARM_SSVE regsets, and/or by changing the SME vector
length (when writing to the NT_ARM_SVE, NT_ARM_SSVE, or NT_ARM_ZA
regsets), but none of these writes will change the value of FPMR.
The task_fpsimd_load() bug was introduced with the initial FPMR support
in commit:
203f2b95a882 ("arm64/fpsimd: Support FEAT_FPMR")
The incorrect FPMR test code was introduced in commit:
7dbd26d0b22d ("kselftest/arm64: Add FPMR coverage to fp-ptrace")
Subsequently, the task_fpsimd_load() bug was fixed in commit:
e5fa85fce08b ("arm64/fpsimd: Don't corrupt FPMR when streaming mode changes")
... whereupon the fp-ptrace FPMR tests started failing reliably, e.g.
| # # Mismatch in saved FPMR: 915058000 != 0
| # not ok 25 SVE write, SVE 64->64, SME 64/0->64/1
Fix this by changing the test to expect that FPMR is *NOT* changed when
PSTATE.SM is changed via ptrace, matching the extant behaviour.
I've chosen to update the test code rather than modifying ptrace to zero
FPMR when PSTATE.SM changes. Not zeroing FPMR is simpler overall, and
allows the NT_ARM_FPMR regset to be handled independently from other
regsets, leaving less scope for error.
Fixes: 7dbd26d0b22d ("kselftest/arm64: Add FPMR coverage to fp-ptrace")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Spickett <david.spickett@arm.com>
Cc: Luis Machado <luis.machado@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250508132644.1395904-22-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Now that the known issues with SME have been addressed, allow SME to be
selected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Cc: David Spickett <david.spickett@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Luis Machado <luis.machado@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Richard Sandiford <richard.sandiford@arm.com>
Cc: Sander De Smalen <sander.desmalen@arm.com>
Cc: Tamas Petz <tamas.petz@arm.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yury Khrustalev <yury.khrustalev@arm.com>
Tested-By: Luis Machado <luis.machado@arm.com>
Link: https://lore.kernel.org/r/20250508132644.1395904-21-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Within sve_set_common() we do not handle error conditions correctly:
* When writing to NT_ARM_SSVE, if sme_alloc() fails, the task will be
left with task->thread.sme_state==NULL, but TIF_SME will be set and
task->thread.fp_type==FP_STATE_SVE. This will result in a subsequent
null pointer dereference when the task's state is loaded or otherwise
manipulated.
* When writing to NT_ARM_SSVE, if sve_alloc() fails, the task will be
left with task->thread.sve_state==NULL, but TIF_SME will be set,
PSTATE.SM will be set, and task->thread.fp_type==FP_STATE_FPSIMD.
This is not a legitimate state, and can result in various problems,
including a subsequent null pointer dereference and/or the task
inheriting stale streaming mode register state the next time its state
is loaded into hardware.
* When writing to NT_ARM_SSVE, if the VL is changed but the resulting VL
differs from that in the header, the task will be left with TIF_SME
set, PSTATE.SM set, but task->thread.fp_type==FP_STATE_FPSIMD. This is
not a legitimate state, and can result in various problems as
described above.
Avoid these problems by allocating memory earlier, and by changing the
task's saved fp_type to FP_STATE_SVE before skipping register writes due
to a change of VL.
To make early returns simpler, I've moved the call to
fpsimd_flush_task_state() earlier. As the tracee's state has already
been saved, and the tracee is known to be blocked for the duration of
sve_set_common(), it doesn't matter whether this is called at the start
or the end.
For consistency I've moved the setting of TIF_SVE earlier. This will be
cleared when loading FPSIMD-only state, and so moving this has no
resulting functional change.
Note that we only allocate the memory for SVE state when SVE register
contents are provided, avoiding unnecessary memory allocations for tasks
which only use FPSIMD.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Fixes: baa8515281b3 ("arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE")
Fixes: 5d0a8d2fba50 ("arm64/ptrace: Ensure that SME is set up for target when writing SSVE state")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Spickett <david.spickett@arm.com>
Cc: Luis Machado <luis.machado@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250508132644.1395904-20-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When a task has PSTATE.SM==1, reads of NT_ARM_SSVE are required to
always present a header with SVE_PT_REGS_SVE, and register data in SVE
format. Reads of NT_ARM_SSVE must never present register data in FPSIMD
format. Within the kernel, we always expect streaming SVE data to be
stored in SVE format.
Currently a user can write to NT_ARM_SSVE with a header presenting
SVE_PT_REGS_FPSIMD rather than SVE_PT_REGS_SVE, placing the task's
FPSIMD/SVE data into an invalid state.
To fix this we can either:
(a) Forbid such writes.
(b) Accept such writes, and immediately convert data into SVE format.
Take the simple option and forbid such writes.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Spickett <david.spickett@arm.com>
Cc: Luis Machado <luis.machado@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250508132644.1395904-19-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The SME ptrace ABI is written around the incorrect assumption that
SVE_PT_REGS_FPSIMD and SVE_PT_REGS_SVE are independent bit flags, where
it is possible for both to be clear. In reality they are different
values for bit 0 of the header flags, where SVE_PT_REGS_FPSIMD is 0 and
SVE_PT_REGS_SVE is 1. In cases where code was written expecting that
neither bit flag would be set, the value is equivalent to
SVE_PT_REGS_FPSIMD.
One consequence of this is that reads of the NT_ARM_SVE or NT_ARM_SSVE
will erroneously present data from the other mode:
* When PSTATE.SM==1, reads of NT_ARM_SVE will present a header with
SVE_PT_REGS_FPSIMD, and FPSIMD-formatted data from streaming mode.
* When PSTATE.SM==0, reads of NT_ARM_SSVE will present a header with
SVE_PT_REGS_FPSIMD, and FPSIMD-formatted data from non-streaming mode.
The original intent was that no register data would be provided in these
cases, as described in commit:
e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Luckily, debuggers do not consume the bogus register data. Both GDB and
LLDB read the NT_ARM_SSVE regset before the NT_ARM_SVE regset, and
assume that when the NT_ARM_SSVE header presents SVE_PT_REGS_FPSIMD, it
is necessary to read register contents from the NT_ARM_SVE regset,
regardless of whether the NT_ARM_SSVE regset provided bogus register
data.
Fix the code to stop presenting register data from the inactive mode.
At the same time, make the manipulation of the flag clearer, and remove
the bogus comment from sve_set_common(). I've given this a quick spin
with GDB and LLDB, and both seem happy.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Spickett <david.spickett@arm.com>
Cc: Luis Machado <luis.machado@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250508132644.1395904-18-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
As sve_init_header_from_task() consumes the saved value of PSTATE.SM and
the saved fp_type, both must be saved before the header is generated.
When generating a coredump for the current task, sve_get_common() calls
sve_init_header_from_task() before saving the task's state. Consequently
the header may be bogus, and the contents of the regset may be
misleading.
Fix this by saving the task's state before generting the header.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Fixes: b017a0cea627 ("arm64/ptrace: Use saved floating point state type to determine SVE layout")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Spickett <david.spickett@arm.com>
Cc: Luis Machado <luis.machado@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250508132644.1395904-17-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|