Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Standardize on the __ASSEMBLER__ macro that is provided by GCC and
Clang compilers and replace __ASSEMBLY__ with __ASSEMBLER__ in both
uapi and non-uapi headers
- Explicitly include <linux/export.h> in architecture and driver files
which contain an EXPORT_SYMBOL() and remove the include from the
files which do not contain the EXPORT_SYMBOL()
- Use the full title of "z/Architecture Principles of Operation" manual
and the name of a section where facility bits are listed
- Use -D__DISABLE_EXPORTS for files in arch/s390/boot to avoid
unnecessary slowing down of the build and confusing external kABI
tools that process symtypes data
- Print additional unrecoverable machine check information to make the
root cause analysis easier
- Move cmpxchg_user_key() handling to uaccess library code, since the
generated code is large anyway and there is no benefit if it is
inlined
- Fix a problem when cmpxchg_user_key() is executing a code with a
non-default key: if a system is IPL-ed with "LOAD NORMAL", and the
previous system used storage keys where the fetch-protection bit was
set for some pages, and the cmpxchg_user_key() is located within such
page, a protection exception happens
- Either the external call or emergency signal order is used to send an
IPI to a remote CPU. Use the external order only, since it is at
least as good and sometimes even better, than the emergency signal
- In case of an early crash the early program check handler prints more
or less random value of the last breaking event address, since it is
not initialized properly. Copy the last breaking event address from
the lowcore to pt_regs to address this
- During STP synchronization check udelay() can not be used, since the
first CPU modifies tod_clock_base and get_tod_clock_monotonic() might
return a non-monotonic time. Instead, busy-loop on other CPUs, while
the the first CPU actually handles the synchronization operation
- When debugging the early kernel boot using QEMU with the -S flag and
GDB attached, skip the decompressor and start directly in kernel
- Rename PAI Crypto event 4210 according to z16 and z17 "z/Architecture
Principles of Operation" manual
- Remove the in-kernel time steering support in favour of the new s390
PTP driver, which allows the kernel clock steered more precisely
- Remove a possible false-positive warning in pte_free_defer(), which
could be triggered in a valid case KVM guest process is initializing
* tag 's390-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (29 commits)
s390/mm: Remove possible false-positive warning in pte_free_defer()
s390/stp: Default to enabled
s390/stp: Remove leap second support
s390/time: Remove in-kernel time steering
s390/sclp: Use monotonic clock in sclp_sync_wait()
s390/smp: Use monotonic clock in smp_emergency_stop()
s390/time: Use monotonic clock in get_cycles()
s390/pai_crypto: Rename PAI Crypto event 4210
scripts/gdb/symbols: make lx-symbols skip the s390 decompressor
s390/boot: Introduce jump_to_kernel() function
s390/stp: Remove udelay from stp_sync_clock()
s390/early: Copy last breaking event address to pt_regs
s390/smp: Remove conditional emergency signal order code usage
s390/uaccess: Merge cmpxchg_user_key() inline assemblies
s390/uaccess: Prevent kprobes on cmpxchg_user_key() functions
s390/uaccess: Initialize code pages executed with non-default access key
s390/skey: Provide infrastructure for executing with non-default access key
s390/uaccess: Make cmpxchg_user_key() library code
s390/page: Add memory clobber to page_set_storage_key()
s390/page: Cleanup page_set_storage_key() inline assemblies
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core scheduler changes:
- Better tracking of maximum lag of tasks in presence of different
slices duration, for better handling of lag in the fair scheduler
(Vincent Guittot)
- Clean up and standardize #if/#else/#endif markers throughout the
entire scheduler code base (Ingo Molnar)
- Make SMP unconditional: build the SMP scheduler's data structures
and logic on UP kernel too, even though they are not used, to
simplify the scheduler and remove around 200 #ifdef/[#else]/#endif
blocks from the scheduler (Ingo Molnar)
- Reorganize cgroup bandwidth control interface handling for better
interfacing with sched_ext (Tejun Heo)
Balancing:
- Bump sd->max_newidle_lb_cost when newidle balance fails (Chris
Mason)
- Remove sched_domain_topology_level::flags to simplify the code
(Prateek Nayak)
- Simplify and clean up build_sched_topology() (Li Chen)
- Optimize build_sched_topology() on large machines (Li Chen)
Real-time scheduling:
- Add initial version of proxy execution: a mechanism for
mutex-owning tasks to inherit the scheduling context of higher
priority waiters.
Currently limited to a single runqueue and conditional on
CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra,
Valentin Schneider)
- Deadline scheduler (Juri Lelli):
- Fix dl_servers initialization order (Juri Lelli)
- Fix DL scheduler's root domain reinitialization logic (Juri
Lelli)
- Fix accounting bugs after global limits change (Juri Lelli)
- Fix scalability regression by implementing less agressive
dl_server handling (Peter Zijlstra)
PSI:
- Improve scalability by optimizing psi_group_change() cpu_clock()
usage (Peter Zijlstra)
Rust changes:
- Make Task, CondVar and PollCondVar methods inline to avoid
unnecessary function calls (Kunwu Chan, Panagiotis Foliadis)
- Add might_sleep() support for Rust code: Rust's "#[track_caller]"
mechanism is used so that Rust's might_sleep() doesn't need to be
defined as a macro (Fujita Tomonori)
- Introduce file_from_location() (Boqun Feng)
Debugging & instrumentation:
- Make clangd usable with scheduler source code files again (Peter
Zijlstra)
- tools: Add root_domains_dump.py which dumps root domains info (Juri
Lelli)
- tools: Add dl_bw_dump.py for printing bandwidth accounting info
(Juri Lelli)
Misc cleanups & fixes:
- Remove play_idle() (Feng Lee)
- Fix check_preemption_disabled() (Sebastian Andrzej Siewior)
- Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis
Claudio R. Goncalves)
- Correct the comment in place_entity() (wang wei)"
* tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
sched/idle: Remove play_idle()
sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
sched: Start blocked_on chain processing in find_proxy_task()
sched: Fix proxy/current (push,pull)ability
sched: Add an initial sketch of the find_proxy_task() function
sched: Fix runtime accounting w/ split exec & sched contexts
sched: Move update_curr_task logic into update_curr_se
locking/mutex: Add p->blocked_on wrappers for correctness checks
locking/mutex: Rework task_struct::blocked_on
sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
sched/topology: Remove sched_domain_topology_level::flags
x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
x86/smpboot: moves x86_topology to static initialize and truncate
x86/smpboot: remove redundant CONFIG_SCHED_SMT
smpboot: introduce SDTL_INIT() helper to tidy sched topology setup
tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
tools/sched: Add root_domains_dump.py which dumps root domains info
sched/deadline: Fix accounting after global limits change
sched/deadline: Reset extra_bw to max_bw when clearing root domains
sched/deadline: Initialize dl_servers after SMP
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"debugfs:
- Remove unneeded debugfs_file_{get,put}() instances
- Remove last remnants of debugfs_real_fops()
- Allow storing non-const void * in struct debugfs_inode_info::aux
sysfs:
- Switch back to attribute_group::bin_attrs (treewide)
- Switch back to bin_attribute::read()/write() (treewide)
- Constify internal references to 'struct bin_attribute'
Support cache-ids for device-tree systems:
- Add arch hook arch_compact_of_hwid()
- Use arch_compact_of_hwid() to compact MPIDR values on arm64
Rust:
- Device:
- Introduce CoreInternal device context (for bus internal methods)
- Provide generic drvdata accessors for bus devices
- Provide Driver::unbind() callbacks
- Use the infrastructure above for auxiliary, PCI and platform
- Implement Device::as_bound()
- Rename Device::as_ref() to Device::from_raw() (treewide)
- Implement fwnode and device property abstractions
- Implement example usage in the Rust platform sample driver
- Devres:
- Remove the inner reference count (Arc) and use pin-init instead
- Replace Devres::new_foreign_owned() with devres::register()
- Require T to be Send in Devres<T>
- Initialize the data kept inside a Devres last
- Provide an accessor for the Devres associated Device
- Device ID:
- Add support for ACPI device IDs and driver match tables
- Split up generic device ID infrastructure
- Use generic device ID infrastructure in net::phy
- DMA:
- Implement the dma::Device trait
- Add DMA mask accessors to dma::Device
- Implement dma::Device for PCI and platform devices
- Use DMA masks from the DMA sample module
- I/O:
- Implement abstraction for resource regions (struct resource)
- Implement resource-based ioremap() abstractions
- Provide platform device accessors for I/O (remap) requests
- Misc:
- Support fallible PinInit types in Revocable
- Implement Wrapper<T> for Opaque<T>
- Merge pin-init blanket dependencies (for Devres)
Misc:
- Fix OF node leak in auxiliary_device_create()
- Use util macros in device property iterators
- Improve kobject sample code
- Add device_link_test() for testing device link flags
- Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits
- Hint to prefer container_of_const() over container_of()"
* tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits)
rust: io: fix broken intra-doc links to `platform::Device`
rust: io: fix broken intra-doc link to missing `flags` module
rust: io: mem: enable IoRequest doc-tests
rust: platform: add resource accessors
rust: io: mem: add a generic iomem abstraction
rust: io: add resource abstraction
rust: samples: dma: set DMA mask
rust: platform: implement the `dma::Device` trait
rust: pci: implement the `dma::Device` trait
rust: dma: add DMA addressing capabilities
rust: dma: implement `dma::Device` trait
rust: net::phy Change module_phy_driver macro to use module_device_table macro
rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id
rust: device_id: split out index support into a separate trait
device: rust: rename Device::as_ref() to Device::from_raw()
arm64: cacheinfo: Provide helper to compress MPIDR value into u32
cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
cacheinfo: Set cache 'id' based on DT data
container_of: Document container_of() is not to be used in new code
driver core: auxiliary bus: fix OF node leak
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- Introduce and start using TRAILING_OVERLAP() helper for fixing
embedded flex array instances (Gustavo A. R. Silva)
- mux: Convert mux_control_ops to a flex array member in mux_chip
(Thorsten Blum)
- string: Group str_has_prefix() and strstarts() (Andy Shevchenko)
- Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
Kees Cook)
- Refactor and rename stackleak feature to support Clang
- Add KUnit test for seq_buf API
- Fix KUnit fortify test under LTO
* tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
sched/task_stack: Add missing const qualifier to end_of_stack()
kstack_erase: Support Clang stack depth tracking
kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
init.h: Disable sanitizer coverage for __init and __head
kstack_erase: Disable kstack_erase for all of arm compressed boot code
x86: Handle KCOV __init vs inline mismatches
arm64: Handle KCOV __init vs inline mismatches
s390: Handle KCOV __init vs inline mismatches
arm: Handle KCOV __init vs inline mismatches
mips: Handle KCOV __init vs inline mismatch
powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section
configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON
configs/hardening: Enable CONFIG_KSTACK_ERASE
stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS
stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth
stackleak: Rename STACKLEAK to KSTACK_ERASE
seq_buf: Introduce KUnit tests
string: Group str_has_prefix() and strstarts()
kunit/fortify: Add back "volatile" for sizeof() constants
acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- Introduce regular REGSET note macros arch-wide (Dave Martin)
- Remove arbitrary 4K limitation of program header size (Yin Fengwei)
- Reorder function qualifiers for copy_clone_args_from_user() (Dishank Jogi)
* tag 'execve-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (25 commits)
fork: reorder function qualifiers for copy_clone_args_from_user
binfmt_elf: remove the 4k limitation of program header size
binfmt_elf: Warn on missing or suspicious regset note names
xtensa: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
um: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
x86/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
sparc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
sh: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
s390/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
riscv: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
powerpc/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
parisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
openrisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
nios2: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
MIPS: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
m68k: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
LoongArch: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
hexagon: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
csky: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
arm64: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
...
|
|
In preparation for adding Clang sanitizer coverage stack depth tracking
that can support stack depth callbacks:
- Add the new top-level CONFIG_KSTACK_ERASE option which will be
implemented either with the stackleak GCC plugin, or with the Clang
stack depth callback support.
- Rename CONFIG_GCC_PLUGIN_STACKLEAK as needed to CONFIG_KSTACK_ERASE,
but keep it for anything specific to the GCC plugin itself.
- Rename all exposed "STACKLEAK" names and files to "KSTACK_ERASE" (named
for what it does rather than what it protects against), but leave as
many of the internals alone as possible to avoid even more churn.
While here, also split "prev_lowest_stack" into CONFIG_KSTACK_ERASE_METRICS,
since that's the only place it is referenced from.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250717232519.2984886-1-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
With time steering moved to userspace, stp can be enabled
by default.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
With moving time steering to userspace, there's no need
to handle leap seconds inside the kernel. Remove it.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Remove the in-kernel time steering in favour of the new
ptp s390 driver, which allows the kernel clock to be steered
more precise.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
This is a cosmetic change because when in smp_emergency_stop()
the system is going to die anyway. But still change the code
to use get_tod_clock_monotonic() to prevent people from copying
broken code.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The PAI crypto event number 4210 is named
PCC_COMPUTE_LAST_BLOCK_CMAC_USING_ENCRYPTED_AES_256A
According to the z16 and z17 Principle of Operation documents
SA22-7832-13 and SA22-7832-14 the event is named
PCC_COMPUTE_LAST_BLOCK_CMAC_USING_ENCRYPTED_AES_256
without a trailing 'A'.
Adjust this event name.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Instead of having the core code guess the note name for each regset,
use USER_REGSET_NOTE_TYPE() to pick the correct name from elf.h.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: linux-s390@vger.kernel.org
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-18-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Define a small SDTL_INIT(maskfn, flagsfn, name) macro and use it to build the
sched_domain_topology_level array. Purely a cleanup; behaviour is unchanged.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20250710105715.66594-2-me@linux.beauty
|
|
When an stp sync check is handled on a system with multiple
cpus each cpu gets a machine check but only the first one
actually handles the sync operation. All other CPUs spin
waiting for the first one to finish with a short udelay().
But udelay can't be used here as the first CPU modifies tod_clock_base
before performing the sync op. During this timeframe
get_tod_clock_monotonic() might return a non-monotonic time.
The time spent waiting should be very short and udelay is a busy loop
anyways, therefore simply remove the udelay.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
In case of an early crash the early program check handler also prints the
last breaking event address which is contained within the pt_regs
structure. However it is not initialized, and therefore a more or less
random value is printed in case of a crash.
Copy the last breaking event address from lowcore to pt_regs in case of an
early program check to address this. This also makes it easier to analyze
early crashes.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Introduce file_getattr() and file_setattr() syscalls to manipulate inode
extended attributes. The syscalls takes pair of file descriptor and
pathname. Then it operates on inode opened accroding to openat()
semantics. The struct file_attr is passed to obtain/change extended
attributes.
This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
that file don't need to be open as we can reference it with a path
instead of fd. By having this we can manipulated inode extended
attributes not only on regular files but also on special ones. This
is not possible with FS_IOC_FSSETXATTR ioctl as with special files
we can not call ioctl() directly on the filesystem inode using fd.
This patch adds two new syscalls which allows userspace to get/set
extended inode attributes on special files by using parent directory
and a path - *at() like syscall.
CC: linux-api@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
CC: linux-xfs@vger.kernel.org
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://lore.kernel.org/20250630-xattrat-syscall-v6-6-c4e3bc35227b@kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
pcpu_ec_call() uses either the external call or emergency signal order
code to signal (aka send an IPI) to a remote CPU. If the remote CPU is
not running the emergency signal order is used.
Measurements show that always using the external order code is at least
as good, and sometimes even better, than the existing code.
Therefore remove emergency signal order code usage from pcpu_ec_call().
Suggested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Heiko Carstens says:
===================
A rather large series which is supposed to fix the crash below[1], which was
seen when running the memop kernel kvm selftest.
Problem is that cmpxchg_user_key() is executing code with a non-default
key. If a system is IPL'ed with "LOAD NORMAL", and in addition the previous
system used storage keys where the fetch-protection bit is set for some pages,
and the cmpxchg_user_key() is located within such page a protection exception
will happen when executing such code.
Idea of this series is to register all code locations running with a
non-default key at compile time. All functions, which run with a non-default
key, then must explicitly call an init function which initializes the storage
key of all pages containing such code locations with default key, which
prevents such protection exceptions.
Furthermore all functions containing code which may be executed with a
non-default access key must be marked with __kprobes to prevent out-of-line
execution of any instruction of such functions, which would result in the same
problem.
By default the kernel will not issue any storage key changing instructions
like before, which will preserve the keyless-subset mode optimizations in
hosts.
Other possible implementations which I discarded:
- Moving the code to an own section. This would require an s390 specific
change to modpost.c, which complains about section mismatches (EX_TABLE
entries in non-default text section). No other architecture has something
similar, so let's keep this architecture specific hack local.
- Just apply the default storage key to the whole kprobes text
section. However this would add special s390 semantics to the kprobes text
section, which no other architecture has. History has shown that such hacks
fire back sooner or later.
Furthermore, and to keep this whole stuff quite simple, this only works for
code locations in core kernel code, not within modules. After this series
there is no module code left with such code, and as of now I don't see any new
kernel code which runs with a non-default access key.
Note: the original crash can be reproduced by replacing
page_set_storage_key(real, PAGE_DEFAULT_KEY, 1);
with
page_set_storage_key(real, 8, 1);
in arch/s390/kernel/skey.c:__skey_regions_initialize()
And then run tools/testing/selftests/kvm/s390/memop from the kernel selftests.
[1]:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 000000000000080b
Fault in home space mode while using kernel ASCE.
AS:0000000002528007 R3:00000001ffffc007 S:00000001ffffb801 P:000000000000013d
Oops: 0004 ilc:1 [#1]SMP
Modules linked in:
CPU: 3 UID: 0 PID: 791 Comm: memop Not tainted 6.16.0-rc1-00006-g3b568201d0a6-dirty #11 NONE
Hardware name: IBM 3931 A01 704 (z/VM 7.4.0)
Krnl PSW : 0794f00180000000 000003ffe0f4d91e (__cmpxchg_user_key1+0xbe/0x190)
R:0 T:1 IO:1 EX:1 Key:9 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
Krnl GPRS: 070003ffdfbf6af0 0000000000070000 0000000095b5a300 0000000000000000
00000000f1000000 0000000000000000 0000000000000090 0000000000000000
0000000000000040 0000000000000018 000003ff9b23d000 0000037fe0ef7bd8
000003ffdfbf7500 00000000962e4000 0000037f00ffffff 0000037fe0ef7aa0
Krnl Code: 000003ffe0f4d912: ad03f0a0 stosm 160(%r15),3
000003ffe0f4d916: a7780000 lhi %r7,0
#000003ffe0f4d91a: b20a6000 spka 0(%r6)
>000003ffe0f4d91e: b2790100 sacf 256
000003ffe0f4d922: a56f0080 llill %r6,128
000003ffe0f4d926: 5810a000 l %r1,0(%r10)
000003ffe0f4d92a: 141e nr %r1,%r14
000003ffe0f4d92c: c0e7ffffffff xilf %r14,4294967295
Call Trace:
[<000003ffe0f4d91e>] __cmpxchg_user_key1+0xbe/0x190
[<000003ffe0189c6e>] cmpxchg_guest_abs_with_key+0x2fe/0x370
[<000003ffe016d28e>] kvm_s390_vm_mem_op_cmpxchg+0x17e/0x350
[<000003ffe0173284>] kvm_arch_vm_ioctl+0x354/0x6f0
[<000003ffe015fedc>] kvm_vm_ioctl+0x2cc/0x6e0
[<000003ffe05348ae>] vfs_ioctl+0x2e/0x70
[<000003ffe0535e70>] __s390x_sys_ioctl+0xe0/0x100
[<000003ffe0f40f06>] __do_syscall+0x136/0x340
[<000003ffe0f4cb2e>] system_call+0x6e/0x90
Last Breaking-Event-Address:
[<000003ffe0f4d896>] __cmpxchg_user_key1+0x36/0x190
===================
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The current assumption is that kernel code is always executed with access
key zero, which means that storage key protection does not apply.
However this assumption is not correct: cmpxchg_user_key() may be executed
with a non-zero key; if then the storage key of the page which belongs to
the cmpxchg_user_key() code contains a key with fetch-protection enabled
the result is a protection exception.
For several performance optimizations storage keys are not initialized on
system boot. To keep these optimizations add infrastructure which allows to
define code ranges within functions which are executed with a non-default
key. When such code is executed such functions must explicitly call
skey_regions_initialize().
This will initialize all storage keys belonging to such code ranges in a
way that no protection exceptions happen when the code is executed with a
non-default access key.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
In case of an unrecoverable machine check only the machine check interrupt
code is printed to the console before the machine is stopped. This makes
root cause analysis sometimes hard.
Print additional machine check information to make analysis easier.
The output now looks like this:
Unrecoverable machine check, code: 00400F5F4C3B0000
6.16.0-rc2-11605-g987a9431e53a-dirty
HW: IBM 3931 A01 704 (z/VM 7.4.0)
PSW: 0706C00180000000 000003FFE0F0462E PFX: 0000000000070000
LBA: 000003FFE0F0462A EDC: 0000000000000000 FSA: 0000000000000000
CRS:
0080000014966A12 0000000087CB41C7 0000000000BFF140 0000000000000000
000000000000FFFF 0000000000BFF140 0000000071000000 0000000087CB41C7
0000000000008000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 00000000024C0007 00000000DB000000 0000000000BFF000
GPRS:
FFFFFFFF00000000 000003FFE0F0462E E10EA4F489F897A6 0000000000000000
7FFFFFF2C0413C4C 000003FFE19B7010 0000000000000000 0000000000000000
0000000000000000 00000001F76B3380 000003FFE15D4050 0000000000000005
0000000000000000 0000000000070000 000003FFE0F0586C 0000037FE00B7DA0
System stopped
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Remove include <linux/export.h> from all files which do not contain an
EXPORT_SYMBOL().
See commit 7d95680d64ac ("scripts/misc-check: check unnecessary #include
<linux/export.h> when W=1") for more details.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Explicitly include <linux/export.h> in files which contain an
EXPORT_SYMBOL().
See commit a934a57a42f6 ("scripts/misc-check: check missing #include
<linux/export.h> when W=1") for more details.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The normal bin_attrs field can now handle const pointers.
This makes the _new variant unnecessary.
Switch all users back.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20250530-sysfs-const-bin_attr-final-v3-4-724bfcf05b99@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which
exports a symbol only to specified modules
- Improve ABI handling in gendwarfksyms
- Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n
- Add checkers for redundant or missing <linux/export.h> inclusion
- Deprecate the extra-y syntax
- Fix a genksyms bug when including enum constants from *.symref files
* tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits)
genksyms: Fix enum consts from a reference affecting new values
arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds
kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES}
efi/libstub: use 'targets' instead of extra-y in Makefile
module: make __mod_device_table__* symbols static
scripts/misc-check: check unnecessary #include <linux/export.h> when W=1
scripts/misc-check: check missing #include <linux/export.h> when W=1
scripts/misc-check: add double-quotes to satisfy shellcheck
kbuild: move W=1 check for scripts/misc-check to top-level Makefile
scripts/tags.sh: allow to use alternative ctags implementation
kconfig: introduce menu type enum
docs: symbol-namespaces: fix reST warning with literal block
kbuild: link lib-y objects to vmlinux forcibly even when CONFIG_MODULES=n
tinyconfig: enable CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
docs/core-api/symbol-namespaces: drop table of contents and section numbering
modpost: check forbidden MODULE_IMPORT_NS("module:") at compile time
kbuild: move kbuild syntax processing to scripts/Makefile.build
Makefile: remove dependency on archscripts for header installation
Documentation/kbuild: Add new gendwarfksyms kABI rules
Documentation/kbuild: Drop section numbers
...
|
|
The extra-y syntax is deprecated. Instead, use always-$(KBUILD_BUILTIN),
which behaves equivalently.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
|
|
Pull more kvm updates from Paolo Bonzini:
Generic:
- Clean up locking of all vCPUs for a VM by using the *_nest_lock()
family of functions, and move duplicated code to virt/kvm/. kernel/
patches acked by Peter Zijlstra
- Add MGLRU support to the access tracking perf test
ARM fixes:
- Make the irqbypass hooks resilient to changes in the GSI<->MSI
routing, avoiding behind stale vLPI mappings being left behind. The
fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
and nuking the vLPI mapping upon a routing change
- Close another VGIC race where vCPU creation races with VGIC
creation, leading to in-flight vCPUs entering the kernel w/o
private IRQs allocated
- Fix a build issue triggered by the recently added workaround for
Ampere's AC04_CPU_23 erratum
- Correctly sign-extend the VA when emulating a TLBI instruction
potentially targeting a VNCR mapping
- Avoid dereferencing a NULL pointer in the VGIC debug code, which
can happen if the device doesn't have any mapping yet
s390:
- Fix interaction between some filesystems and Secure Execution
- Some cleanups and refactorings, preparing for an upcoming big
series
x86:
- Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE
to fix a race between AP destroy and VMRUN
- Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for
the VM
- Refine and harden handling of spurious faults
- Add support for ALLOWED_SEV_FEATURES
- Add #VMGEXIT to the set of handlers special cased for
CONFIG_RETPOLINE=y
- Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing
features that utilize those bits
- Don't account temporary allocations in sev_send_update_data()
- Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock
Threshold
- Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU
IBPB, between SVM and VMX
- Advertise support to userspace for WRMSRNS and PREFETCHI
- Rescan I/O APIC routes after handling EOI that needed to be
intercepted due to the old/previous routing, but not the
new/current routing
- Add a module param to control and enumerate support for device
posted interrupts
- Fix a potential overflow with nested virt on Intel systems running
32-bit kernels
- Flush shadow VMCSes on emergency reboot
- Add support for SNP to the various SEV selftests
- Add a selftest to verify fastops instructions via forced emulation
- Refine and optimize KVM's software processing of the posted
interrupt bitmap, and share the harvesting code between KVM and the
kernel's Posted MSI handler"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
rtmutex_api: provide correct extern functions
KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer
KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race
KVM: arm64: Unmap vLPIs affected by changes to GSI routing information
KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()
KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock
KVM: arm64: Use lock guard in vgic_v4_set_forwarding()
KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation
arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue
KVM: s390: Simplify and move pv code
KVM: s390: Refactor and split some gmap helpers
KVM: s390: Remove unneeded srcu lock
s390: Remove unneeded includes
s390/uv: Improve splitting of large folios that cannot be split while dirty
s390/uv: Always return 0 from s390_wiggle_split_folio() if successful
s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful
rust: add helper for mutex_trylock
RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs
KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs
x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation
...
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
* Fix interaction between some filesystems and Secure Execution
* Some cleanups and refactorings, preparing for an upcoming big series
|
|
All functions in kvm/gmap.c fit better in kvm/pv.c instead.
Move and rename them appropriately, then delete the now empty
kvm/gmap.c and kvm/gmap.h.
Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20250528095502.226213-5-imbrenda@linux.ibm.com
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20250528095502.226213-5-imbrenda@linux.ibm.com>
|
|
Currently, starting a PV VM on an iomap-based filesystem with large
folio support, such as XFS, will not work. We'll be stuck in
unpack_one()->gmap_make_secure(), because we can't seem to make progress
splitting the large folio.
The problem is that we require a writable PTE but a writable PTE under such
filesystems will imply a dirty folio.
So whenever we have a writable PTE, we'll have a dirty folio, and dirty
iomap folios cannot currently get split, because
split_folio()->split_huge_page_to_list_to_order()->filemap_release_folio()
will fail in iomap_release_folio().
So we will not make any progress splitting such large folios.
Until dirty folios can be split more reliably, let's manually trigger
writeback of the problematic folio using
filemap_write_and_wait_range(), and retry the split immediately
afterwards exactly once, before looking up the folio again.
Should this logic be part of split_folio()? Likely not; most split users
don't have to split so eagerly to make any progress.
For now, this seems to affect xfs, zonefs and erofs, and this patch
makes it work again (tested on xfs only).
While this could be considered a fix for commit 6795801366da ("xfs: Support
large folios"), commit df2f9708ff1f ("zonefs: enable support for large
folios") and commit ce529cc25b18 ("erofs: enable large folios for iomap
mode"), before commit eef88fe45ac9 ("s390/uv: Split large folios in
gmap_make_secure()"), we did not try splitting large folios at all. So it's
all rather part of making SE compatible with file systems that support
large folios. But to have some "Fixes:" tag, let's just use eef88fe45ac9.
Not CCing stable, because there are a lot of dependencies, and it simply
not working is not critical in stable kernels.
Reported-by: Sebastian Mitterle <smitterl@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-58218
Fixes: eef88fe45ac9 ("s390/uv: Split large folios in gmap_make_secure()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20250516123946.1648026-4-david@redhat.com
Message-ID: <20250516123946.1648026-4-david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
Let's consistently return 0 if the operation was successful, and just
detect ourselves whether splitting is required -- folio_test_large() is
a cheap operation.
Update the documentation.
Should we simply always return -EAGAIN instead of 0, so we don't have
to handle it in the caller? Not sure, staring at the documentation, this
way looks a bit cleaner.
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20250516123946.1648026-3-david@redhat.com
Message-ID: <20250516123946.1648026-3-david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
successful
If s390_wiggle_split_folio() returns 0 because splitting a large folio
succeeded, we will return 0 from make_hva_secure() even though a retry
is required. Return -EAGAIN in that case.
Otherwise, we'll return 0 from gmap_make_secure(), and consequently from
unpack_one(). In kvm_s390_pv_unpack(), we assume that unpacking
succeeded and skip unpacking this page. Later on, we run into issues
and fail booting the VM.
So far, this issue was only observed with follow-up patches where we
split large pagecache XFS folios. Maybe it can also be triggered with
shmem?
We'll cleanup s390_wiggle_split_folio() a bit next, to also return 0
if no split was required.
Fixes: d8dfda5af0be ("KVM: s390: pv: fix race when making a page secure")
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20250516123946.1648026-2-david@redhat.com
Message-ID: <20250516123946.1648026-2-david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
"Core & generic-arch updates:
- Add support for dynamic constraints and propagate it to the Intel
driver (Kan Liang)
- Fix & enhance driver-specific throttling support (Kan Liang)
- Record sample last_period before updating on the x86 and PowerPC
platforms (Mark Barnett)
- Make perf_pmu_unregister() usable (Peter Zijlstra)
- Unify perf_event_free_task() / perf_event_exit_task_context()
(Peter Zijlstra)
- Simplify perf_event_release_kernel() and perf_event_free_task()
(Peter Zijlstra)
- Allocate non-contiguous AUX pages by default (Yabin Cui)
Uprobes updates:
- Add support to emulate NOP instructions (Jiri Olsa)
- selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa)
x86 Intel PMU enhancements:
- Support Intel Auto Counter Reload [ACR] (Kan Liang)
- Add PMU support for Clearwater Forest (Dapeng Mi)
- Arch-PEBS preparatory changes: (Dapeng Mi)
- Parse CPUID archPerfmonExt leaves for non-hybrid CPUs
- Decouple BTS initialization from PEBS initialization
- Introduce pairs of PEBS static calls
x86 AMD PMU enhancements:
- Use hrtimer for handling overflows in the AMD uncore driver
(Sandipan Das)
- Prevent UMC counters from saturating (Sandipan Das)
Fixes and cleanups:
- Fix put_ctx() ordering (Frederic Weisbecker)
- Fix irq work dereferencing garbage (Frederic Weisbecker)
- Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian
Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan
Das, Thorsten Blum)"
* tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
perf/headers: Clean up <linux/perf_event.h> a bit
perf/uapi: Clean up <uapi/linux/perf_event.h> a bit
perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h>
mips/perf: Remove driver-specific throttle support
xtensa/perf: Remove driver-specific throttle support
sparc/perf: Remove driver-specific throttle support
loongarch/perf: Remove driver-specific throttle support
csky/perf: Remove driver-specific throttle support
arc/perf: Remove driver-specific throttle support
alpha/perf: Remove driver-specific throttle support
perf/apple_m1: Remove driver-specific throttle support
perf/arm: Remove driver-specific throttle support
s390/perf: Remove driver-specific throttle support
powerpc/perf: Remove driver-specific throttle support
perf/x86/zhaoxin: Remove driver-specific throttle support
perf/x86/amd: Remove driver-specific throttle support
perf/x86/intel: Remove driver-specific throttle support
perf: Only dump the throttle log for the leader
perf: Fix the throttle logic for a group
perf/core: Add the is_event_in_freq_mode() helper to simplify the code
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Large rework of the protected key crypto code to allow for
asynchronous handling without memory allocation
- Speed up system call entry/exit path by re-implementing lazy ASCE
handling
- Add module autoload support for the diag288_wdt watchdog device
driver
- Get rid of s390 specific strcpy() and strncpy() implementations, and
switch all remaining users to strscpy() when possible
- Various other small fixes and improvements
* tag 's390-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (51 commits)
s390/pci: Serialize device addition and removal
s390/pci: Allow re-add of a reserved but not yet removed device
s390/pci: Prevent self deletion in disable_slot()
s390/pci: Remove redundant bus removal and disable from zpci_release_device()
s390/crypto: Extend protected key conversion retry loop
s390/pci: Fix __pcilg_mio_inuser() inline assembly
s390/ptrace: Always inline regs_get_kernel_stack_nth() and regs_get_register()
s390/thread_info: Cleanup header includes
s390/extmem: Add workaround for DCSS unload diag
s390/crypto: Rework protected key AES for true asynch support
s390/cpacf: Rework cpacf_pcc() to return condition code
s390/mm: Fix potential use-after-free in __crst_table_upgrade()
s390/mm: Add mmap_assert_write_locked() check to crst_table_upgrade()
s390/string: Remove strcpy() implementation
s390/con3270: Use strscpy() instead of strcpy()
s390/boot: Use strspcy() instead of strcpy()
s390: Simple strcpy() to strscpy() conversions
s390/pkey/crypto: Introduce xflags param for pkey in-kernel API
s390/pkey: Provide and pass xflags within pkey and zcrypt layers
s390/uv: Remove uv_get_secret_metadata function
...
|
|
The throttle support has been added in the generic code. Remove
the driver-specific throttle support.
Besides the throttle, perf_event_overflow may return true because of
event_limit. It already does an inatomic event disable. The pmu->stop
is not required either.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20250520181644.2673067-8-kan.liang@linux.intel.com
|
|
Both regs_get_kernel_stack_nth() and regs_get_register() are not
inlined. With the new ftrace funcgraph-args feature they show up in
function graph tracing:
4) | sched_core_idle_cpu(cpu=4) {
4) 0.257 us | regs_get_register(regs=0x37fe00afa10, offset=2);
4) 0.218 us | regs_get_register(regs=0x37fe00afa10, offset=3);
4) 0.225 us | regs_get_register(regs=0x37fe00afa10, offset=4);
4) 0.239 us | regs_get_register(regs=0x37fe00afa10, offset=5);
4) 0.239 us | regs_get_register(regs=0x37fe00afa10, offset=6);
4) 0.245 us | regs_get_kernel_stack_nth(regs=0x37fe00afa10, n=20);
This is subtoptimal, since both functions are supposed to be ftrace
internal helper functions. If they appear in ftrace traces this reduces
readability significantly, plus this adds tons of extra useless extra
entries.
Address this by moving both functions and required helpers to ptrace.h and
always inline them. This way they don't appear in traces anymore. In
addition the overhead that comes with functions calls is also reduced.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
In case of stack corruption stack_invalid() is called and the expectation
is that register r10 contains the last breaking event address. This
dependency is quite subtle and broke a couple of years ago without that
anybody noticed.
Fix this by getting rid of the dependency and read the last breaking event
address from lowcore.
Fixes: 56e62a737028 ("s390: convert to generic entry")
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Convert all strcpy() usages to strscpy() where the conversion means
just replacing strcpy() with strscpy(). strcpy() is deprecated since
it performs no bounds checking on the destination buffer.
Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The uv_get_secret_metadata() in-kernel function was only
offered and used by the pkey uv handler. Remove it as there
is no customer any more.
Suggested-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Acked-by: Holger Dengler <dengler@linux.ibm.com>
Link: https://lore.kernel.org/r/20250424133619.16495-24-freude@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Rename the internal UV function find_secret() to uv_find_secret()
and publish it as new UV API in-kernel function.
The pkey uv handler may be called in a do-not-allocate memory
situation where sleeping is allowed but allocating memory which
may cause IO operations is not. For example when an encrypted
swap file is used and the encryption is done via UV retrievable
secrets with protected keys.
The UV API function uv_get_secret_metadata() allocates memory
and then calls the find_secret() function. By exposing the
find_secret() function as a new UV API function uv_find_secret()
it is possible to retrieve UV secret meta data without any
memory allocations from the UV when the caller offers space
for one struct uv_secret_list.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Acked-by: Holger Dengler <dengler@linux.ibm.com>
Link: https://lore.kernel.org/r/20250424133619.16495-22-freude@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
In CPUMF attribute definitions for z15 all CPUMF attributes
have configuration values of the form 0x0[0-9a-f]{3} .
However 2 defines do not match this scheme, they have two leading
zeroes instead of one. Adjust this. No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The third argument of strscpy() is optional and can be left away iff
the destination is an array and the maximum size of the copy is the
size of destination.
Remove the third argument for those cases where this is possible.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Rename strncpy_skip_quote() to strscpy_skip_quote() and change its
implementation so that the destination string is always NUL terminated.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The s390 specific diag288_wdt watchdog driver makes use of the virtual
watchdog timer, which is available in most machine configurations.
If executing the diagnose instruction with subcode 0x288 results in an
exception the watchdog timer is not available, otherwise it is available.
In order to allow module autoload of the diag288_wdt module, move the
detection of the virtual watchdog timer to early boot code, and provide
its availability as a cpu feature.
This allows to make use of module_cpu_feature_match() to automatically load
the module iff the virtual watchdog timer is available.
Suggested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Tested-by: Mete Durlu <meted@linux.ibm.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250410095036.1525057-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Reduce system call overhead time (round trip time for invoking a
non-existent system call) by 25%.
With the removal of set_fs() [1] lazy control register handling was removed
in order to keep kernel entry and exit simple. However this made system
calls slower.
With the conversion to generic entry [2] and numerous follow up changes
which simplified the entry code significantly, adding support for lazy asce
handling doesn't add much complexity to the entry code anymore.
In particular this means:
- On kernel entry the primary asce is not modified and contains the user
asce
- Kernel accesses which require secondary-space mode (for example futex
operations) are surrounded by enable_sacf_uaccess() and
disable_sacf_uaccess() calls. enable_sacf_uaccess() sets the primary asce
to kernel asce so that the sacf instruction can be used to switch to
secondary-space mode. The primary asce is changed back to user asce with
disable_sacf_uaccess().
The state of the control register which contains the primary asce is
reflected with a new TIF_ASCE_PRIMARY bit. This is required on context
switch so that the correct asce is restored for the scheduled in process.
In result address spaces are now setup like this:
CPU running in | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE
-----------------------------|-----------|-----------|-----------
user space | user | user | kernel
kernel (no sacf) | user | user | kernel
kernel (during sacf uaccess) | kernel | user | kernel
kernel (kvm guest execution) | guest | user | kernel
In result cr1 control register content is not changed except for:
- futex system calls
- legacy s390 PCI system calls
- the kvm specific cmpxchg_user_key() uaccess helper
This leads to faster system call execution.
[1] 87d598634521 ("s390/mm: remove set_fs / rework address space handling")
[2] 56e62a737028 ("s390: convert to generic entry")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
In PMU event initialization functions
- cpumsf_pmu_event_init()
- cpumf_pmu_event_init()
- cfdiag_event_init()
the partially created event had to be removed when an error was detected.
The event::event_init() member function had to release all resources
it allocated in case of error. event::destroy() had to be called
on freeing an event after it was successfully created and
event::event_init() returned success.
With
commit c70ca298036c ("perf/core: Simplify the perf_event_alloc() error path")
this is not necessary anymore. The performance subsystem common
code now always calls event::destroy() to clean up the allocated
resources created during event initialization.
Remove the event::destroy() invocation in PMU event initialization
or that function is called twice for each event that runs into an
error condition in event creation.
This is the kernel log entry which shows up without the fix:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 43388 at lib/refcount.c:87 refcount_dec_not_one+0x74/0x90
CPU: 0 UID: 0 PID: 43388 Comm: perf Not tainted 6.15.0-20250407.rc1.git0.300.fc41.s390x+git #1 NONE
Hardware name: IBM 3931 A01 704 (LPAR)
Krnl PSW : 0704c00180000000 00000209cb2c1b88 (refcount_dec_not_one+0x78/0x90)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 0000020900000027 0000020900000023 0000000000000026 0000018900000000
00000004a2200a00 0000000000000000 0000000000000057 ffffffffffffffea
00000002b386c600 00000002b3f5b3e0 00000209cc51f140 00000209cc7fc550
0000000001449d38 ffffffffffffffff 00000209cb2c1b84 00000189d67dfb80
Krnl Code: 00000209cb2c1b78: c02000506727 larl %r2,00000209cbcce9c6
00000209cb2c1b7e: c0e5ffbd4431 brasl %r14,00000209caa6a3e0
#00000209cb2c1b84: af000000 mc 0,0
>00000209cb2c1b88: a7480001 lhi %r4,1
00000209cb2c1b8c: ebeff0a00004 lmg %r14,%r15,160(%r15)
00000209cb2c1b92: ec243fbf0055 risbg %r2,%r4,63,191,0
00000209cb2c1b98: 07fe bcr 15,%r14
00000209cb2c1b9a: 47000700 bc 0,1792
Call Trace:
[<00000209cb2c1b88>] refcount_dec_not_one+0x78/0x90
[<00000209cb2c1dc4>] refcount_dec_and_mutex_lock+0x24/0x90
[<00000209caa3c29e>] hw_perf_event_destroy+0x2e/0x80
[<00000209cacaf8b4>] __free_event+0x74/0x270
[<00000209cacb47c4>] perf_event_alloc.part.0+0x4a4/0x730
[<00000209cacbf3e8>] __do_sys_perf_event_open+0x248/0xc20
[<00000209cacc14a4>] __s390x_sys_perf_event_open+0x44/0x50
[<00000209cb8114de>] __do_syscall+0x12e/0x260
[<00000209cb81ce34>] system_call+0x74/0x98
Last Breaking-Event-Address:
[<00000209caa6a4d2>] __warn_printk+0xf2/0x100
---[ end trace 0000000000000000 ]---
Fixes: c70ca298036c ("perf/core: Simplify the perf_event_alloc() error path")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Update CPU Measurement counter facility support for the
extended counter set for machine types 9175 and 9176.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Add detection for machine types 0x9175 and 0x9176 and set ELF platform
name to z17.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Fix machine check handler _CIF_MCCK_GUEST bit setting by adding the
missing base register for relocated lowcore address
- Fix build failure on older linkers by conditionally adding the
-no-pie linker option only when it is supported
- Fix inaccurate kernel messages in vfio-ap by providing descriptive
error notifications for AP queue sharing violations
- Fix PCI isolation logic by ensuring non-VF devices correctly return
false in zpci_bus_is_isolated_vf()
- Fix PCI DMA range map setup by using dma_direct_set_offset() to add a
proper sentinel element, preventing potential overruns and
translation errors
- Cleanup header dependency problems with asm-offsets.c
- Add fault info for unexpected low-address protection faults in user
mode
- Add support for HOTPLUG_SMT, replacing the arch-specific "nosmt"
handling with common code handling
- Use bitop functions to implement CPU flag helper functions to ensure
that bits cannot get lost if modified in different contexts on a CPU
- Remove unused machine_flags for the lowcore
* tag 's390-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vfio-ap: Fix no AP queue sharing allowed message written to kernel log
s390/pci: Fix dev.dma_range_map missing sentinel element
s390/mm: Dump fault info in case of low address protection fault
s390/smp: Add support for HOTPLUG_SMT
s390: Fix linker error when -no-pie option is unavailable
s390/processor: Use bitop functions for cpu flag helper functions
s390/asm-offsets: Remove ASM_OFFSETS_C
s390/asm-offsets: Include ftrace_regs.h instead of ftrace.h
s390/kvm: Split kvm_host header file
s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFs
s390/lowcore: Remove unused machine_flags
s390/entry: Fix setting _CIF_MCCK_GUEST with lowcore relocation
|
|
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on s390, covering the
vdso.
[hca@linux.ibm.com: update supported architectures]
Link: https://lkml.kernel.org/r/20250317131917.1332402-1-hca@linux.ibm.com
Link: https://lkml.kernel.org/r/20250311123326.2686682-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|