Age | Commit message (Collapse) | Author |
|
Introduce a global function that jumps from the decompressor to the
decompressed kernel. Put its address into svc_old_psw, from where GDB
can take it without loading decompressor symbols. It should be
available throughout the entire decompressor execution, because it's
placed there statically, and nothing in the decompressor uses the SVC
instruction.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250625154220.75300-2-iii@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembler code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This is bad since macros starting with two underscores are names
that are reserved by the C language. It can also be very confusing
for the developers when switching between userspace and kernelspace
coding, or when dealing with uapi headers that rather should use
__ASSEMBLER__ instead. So let's now standardize on the __ASSEMBLER__
macro that is provided by the compilers.
This is a completely mechanical patch (done with a simple "sed -i"
statement), with some manual fixups done later while rebasing the
patch.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20250611140046.137739-3-thuth@redhat.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The early boot code contains various open-coded inline assemblies with
exception handling. In order to handle possible exceptions each of them
changes the program check new psw, and restores it.
In order to simplify the various inline assemblies add simple exception
table support: the program check handler is called with a fully populated
pt_regs on the stack and may change the psw and register members. When the
program check handler returns the psw and registers from pt_regs will be
used to continue execution.
The program check handler searches the exception table for an entry which
matches the address of the program check. If such an entry is found the psw
address within pt_regs on the stack is replaced with a fixup address, and
execution continues at the new address.
If no entry is found the psw is changed to a disabled wait psw and
execution stops.
Before entering the C part of the program check handler the address of the
program check new psw is replaced to a minimalistic handler.
This is supposed to help against program check loops. If an exception
happens while in program check processing the register contents of the
original exception are restored and a disabled wait psw is loaded.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Setup a pt_regs structure on the stack, poplulate it in low level assembler
code, and pass it to print_pgm_check_info(). This way there is no need to
access then lowcore from print_pgm_check_info() anymore, and the function
looks like a normal program check handler function.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Similar to x86 and loongarch add a "debug-alternative" command line
parameter, which allows for alternative debugging. The parameter
itself comes with architecture specific semantics:
"debug-alternative"
-> print debug message for every single alternative
"debug-alternative=0;2"
-> print debug message for all alternatives with type 0 and 2
"debug-alternative=0:0-7"
-> print debug message for all alternatives with type 0 which have a
facility number within the range of 0-7
"debug-alternative=0:!8;1"
-> print debug message for all alternatives with type 0, for all
facility numbers, except facility 8, and in addition print all
alternatives with type 1
A defconfig build currently results in a kernel with more than 20.000
alternatives, where the majority is for the niai alternative (spinlocks),
and the relocated lowcore alternative. The following kernel command like
options limit alternative debug output, and enable dynamic debug messages:
debug-alternative=0:!49;1:!0
earlyprintk
bootdebug
ignore_loglevel
loglevel=8
dyndbg="file alternative.c +p"
This results in output like this:
alt: [0/ 11] 0000021b9ce8680c: c0f400000089 -> c00400000000
alt: [0/ 64] 0000021b9ce87e60: c0f400000043 -> c00400000000
alt: [0/133] 0000021b9ce88c56: c0f400000027 -> c00400000000
alt: [0/ 74] 0000021b9ce89410: c0f40000002a -> c00400000000
alt: [0/ 40] 0000021b9dc3720a: 47000000 -> b280d398
alt: [0/193] 0000021b9dc37306: 47000000 -> b201d2b0
alt: [0/193] 0000021b9dc37354: c00400000000 -> d20720c0d2b0
alt: [1/ 5] 0000038d720d7bf2: c0f400000016 -> c00400000000
With
[<alternative type>/<alternative data>] <address> oldcode -> newcode
Alternative data depends on the alternative type: for type 0
(ALT_TYPE_FACILITY) data is the facility. For type 1 (ALT_TYPE_FEATURE)
data is the corresponding machine feature.
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Convert MACHINE_HAS_... to cpu_has_...() which uses test_facility() instead
of testing the machine_flags lowcore member if the feature is present.
test_facility() generates better code since it results in a static branch
without accessing memory. The branch is patched via alternatives by the
decompressor depending on the availability of the required facility.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Convert MACHINE_HAS_... to cpu_has_...() which uses test_facility() instead
of testing the machine_flags lowcore member if the feature is present.
test_facility() generates better code since it results in a static branch
without accessing memory. The branch is patched via alternatives by the
decompressor depending on the availability of the required facility.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Introduce boot_debug() calls to track memory detection, online ranges,
reserved areas, and allocations (except for VMEM allocations, which are
too frequent). Instead introduce dump_physmem_reserved() function which
prints out full memory tracking information. This helps in debugging
early boot memory handling.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Dump the boot message ring buffer when a crash occurs during boot, but
only if bootdebug is enabled. This helps assist in analyzing boot-time
issues by providing additional debugging information.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Enhance boot debugging by allowing the "bootdebug" kernel parameter to
accept an optional comma-separated list of prefixes. Only debug messages
starting with these prefixes will be printed during boot. For example:
bootdebug=startup,vmem
Not specifying a filter for the "bootdebug" parameter prints all debug
messages. The `boot_fmt` macro can be defined to set a common prefix:
#define boot_fmt(fmt) "startup: " fmt
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Modify boot_printk() to return int, aligning it with printk().
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Add message severity levels for boot messages, similar to the main
kernel. Support command-line options that control console output
verbosity, including "loglevel," "ignore_loglevel," "debug," and "quiet".
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Add physmem_alloc() as a variant of physmem_alloc_or_die() that can return
an error instead of triggering a panic on OOM. This allows callers to
implement alternative fallback paths.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The new name better reflects the function's behavior, emphasizing that
it will terminate execution if allocation fails.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
By default page protection definitions like PAGE_RX have the _PAGE_NOEXEC
bit set. For older machines without the instruction execution protection
facility this bit is not allowed to be used in page table entries, and
therefore must be removed.
This is done at a couple of page table walkers, but also at some but not
all page table modification functions like ptep_modify_prot_commit(). Avoid
all of this and change the page, segment and region3 protection definitions
so that the noexec bit is masked out automatically if the instruction
execution-protection facility is not available. This is similar to what
also various other architectures do which had to solve the same problem.
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Rename decompressor_printk() to boot_printk() just to have a shorter
function name, which also makes the code more readable.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The s390 architecture defines two special per-CPU data pages
called the "prefix area". In s390-linux terminology this is usually
called "lowcore". This memory area contains system configuration
data like old/new PSW's for system call/interrupt/machine check
handlers and lots of other data. It is normally mapped to logical
address 0. This area can only be accessed when in supervisor mode.
This means that kernel code can dereference NULL pointers, because
accesses to address 0 are allowed. Parts of lowcore can be write
protected, but read accesses and write accesses outside of the write
protected areas are not caught.
To remove this limitation for debugging and testing, remap lowcore to
another address and define a function get_lowcore() which simply
returns the address where lowcore is mapped at. This would normally
introduce a pointer dereference (=memory read). As lowcore is used
for several very often used variables, add code to patch this function
during runtime, so we avoid the memory reads.
For C code get_lowcore() has to be used, for assembly code it is
the GET_LC macro. When using this macro/function a reference is added
to alternative patching. All these locations will be patched to the
actual lowcore location when the kernel is booted or a module is loaded.
To make debugging/bisecting problems easier, this patch adds all the
infrastructure but the lowcore address is still hardwired to 0. This
way the code can be converted on a per function basis, and the
functionality is enabled in a patch after all the functions have
been converted.
Note that this requires at least z16 because the old lpsw instruction
only allowed a 12 bit displacement. z16 introduced lpswey which allows
20 bits (signed), so the lowcore can effectively be mapped from
address 0 - 0x7e000. To use 0x7e000 as address, a 6 byte lgfi
instruction would have to be used in the alternative. To save two
bytes, llilh can be used, but this only allows to set bits 16-31 of
the address. In order to use the llilh instruction, use 0x70000 as
alternative lowcore address. This is still large enough to catch
NULL pointer dereferences into large arrays.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Add the required code to patch alternatives early in the decompressor.
This is required for the upcoming lowcore relocation changes, where
alternatives for facility 193 need to get patched before lowcore
alternatives.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
When the kernel is built with CONFIG_PIE_BUILD option enabled it
uses dynamic symbols, for which the linker does not allow more
than 64K number of entries. This can break features like kpatch.
Hence, whenever possible the kernel is built with CONFIG_PIE_BUILD
option disabled. For that support of unaligned symbols generated by
linker scripts in the compiler is necessary.
However, older compilers might lack such support. In that case the
build process resorts to CONFIG_PIE_BUILD option-enabled build.
Compile object files with -fPIC option and then link the kernel
binary with -no-pie linker option.
As result, the dynamic symbols are not generated and not only kpatch
feature succeeds, but also the whole CONFIG_PIE_BUILD option-enabled
code could be dropped.
[ agordeev: Reworded the commit message ]
Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Rework deployment of kernel image for both compressed and
uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED
kernel configuration variable.
In case CONFIG_KERNEL_UNCOMPRESSED is disabled avoid uncompressing
the kernel to a temporary buffer and copying it to the target
address. Instead, uncompress it directly to the target destination.
In case CONFIG_KERNEL_UNCOMPRESSED is enabled avoid moving the
kernel to default 0x100000 location when KASLR is disabled or
failed. Instead, use the uncompressed kernel image directly.
In case KASLR is disabled or failed .amode31 section location in
memory is not randomized and precedes the kernel image. In case
CONFIG_KERNEL_UNCOMPRESSED is disabled that location overlaps the
area used by the decompression algorithm. That is fine, since that
area is not used after the decompression finished and the size of
.amode31 section is not expected to exceed BOOT_HEAP_SIZE ever.
There is no decompression in case CONFIG_KERNEL_UNCOMPRESSED is
enabled. Therefore, rename decompress_kernel() to deploy_kernel(),
which better describes both uncompressed and compressed cases.
Introduce AMODE31_SIZE macro to avoid immediate value of 0x3000
(the size of .amode31 section) in the decompressor linker script.
Modify the vmlinux linker script to force the size of .amode31
section to AMODE31_SIZE (the value of (_eamode31 - _samode31)
could otherwise differ as result of compiler options used).
Introduce __START_KERNEL macro that defines the kernel ELF image
entry point and set it to the currrent value of 0x100000.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The uncoupling physical vs virtual address spaces brings
the following benefits to s390:
- virtual memory layout flexibility;
- closes the address gap between kernel and modules, it
caused s390-only problems in the past (e.g. 'perf' bugs);
- allows getting rid of trampolines used for module calls
into kernel;
- allows simplifying BPF trampoline;
- minor performance improvement in branch prediction;
- kernel randomization entropy is magnitude bigger, as it is
derived from the amount of available virtual, not physical
memory;
The whole change could be described in two pictures below:
before and after the change.
Some aspects of the virtual memory layout setup are not
clarified (number of page levels, alignment, DMA memory),
since these are not a part of this change or secondary
with regard to how the uncoupling itself is implemented.
The focus of the pictures is to explain why __va() and __pa()
macros are implemented the way they are.
Memory layout in V==R mode:
| Physical | Virtual |
+- 0 --------------+- 0 --------------+ identity mapping start
| | S390_lowcore | Low-address memory
| +- 8 KB -----------+
| | |
| | identity | phys == virt
| | mapping | virt == phys
| | |
+- AMODE31_START --+- AMODE31_START --+ .amode31 rand. phys/virt start
|.amode31 text/data|.amode31 text/data|
+- AMODE31_END ----+- AMODE31_END ----+ .amode31 rand. phys/virt start
| | |
| | |
+- __kaslr_offset, __kaslr_offset_phys| kernel rand. phys/virt start
| | |
| kernel text/data | kernel text/data | phys == kvirt
| | |
+------------------+------------------+ kernel phys/virt end
| | |
| | |
| | |
| | |
+- ident_map_size -+- ident_map_size -+ identity mapping end
| |
| ... unused gap |
| |
+---- vmemmap -----+ 'struct page' array start
| |
| virtually mapped |
| memory map |
| |
+- __abs_lowcore --+
| |
| Absolute Lowcore |
| |
+- __memcpy_real_area
| |
| Real Memory Copy|
| |
+- VMALLOC_START --+ vmalloc area start
| |
| vmalloc area |
| |
+- MODULES_VADDR --+ modules area start
| |
| modules area |
| |
+------------------+ UltraVisor Secure Storage limit
| |
| ... unused gap |
| |
+KASAN_SHADOW_START+ KASAN shadow memory start
| |
| KASAN shadow |
| |
+------------------+ ASCE limit
Memory layout in V!=R mode:
| Physical | Virtual |
+- 0 --------------+- 0 --------------+
| | S390_lowcore | Low-address memory
| +- 8 KB -----------+
| | |
| | |
| | ... unused gap |
| | |
+- AMODE31_START --+- AMODE31_START --+ .amode31 rand. phys/virt start
|.amode31 text/data|.amode31 text/data|
+- AMODE31_END ----+- AMODE31_END ----+ .amode31 rand. phys/virt end (<2GB)
| | |
| | |
+- __kaslr_offset_phys | kernel rand. phys start
| | |
| kernel text/data | |
| | |
+------------------+ | kernel phys end
| | |
| | |
| | |
| | |
+- ident_map_size -+ |
| |
| ... unused gap |
| |
+- __identity_base + identity mapping start (>= 2GB)
| |
| identity | phys == virt - __identity_base
| mapping | virt == phys + __identity_base
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
+---- vmemmap -----+ 'struct page' array start
| |
| virtually mapped |
| memory map |
| |
+- __abs_lowcore --+
| |
| Absolute Lowcore |
| |
+- __memcpy_real_area
| |
| Real Memory Copy|
| |
+- VMALLOC_START --+ vmalloc area start
| |
| vmalloc area |
| |
+- MODULES_VADDR --+ modules area start
| |
| modules area |
| |
+- __kaslr_offset -+ kernel rand. virt start
| |
| kernel text/data | phys == (kvirt - __kaslr_offset) +
| | __kaslr_offset_phys
+- kernel .bss end + kernel rand. virt end
| |
| ... unused gap |
| |
+------------------+ UltraVisor Secure Storage limit
| |
| ... unused gap |
| |
+KASAN_SHADOW_START+ KASAN shadow memory start
| |
| KASAN shadow |
| |
+------------------+ ASCE limit
Unused gaps in the virtual memory layout could be present
or not - depending on how partucular system is configured.
No page tables are created for the unused gaps.
The relative order of vmalloc, modules and kernel image in
virtual memory is defined by following considerations:
- start of the modules area and end of the kernel should reside
within 4GB to accommodate relative 32-bit jumps. The best way
to achieve that is to place kernel next to modules;
- vmalloc and module areas should locate next to each other
to prevent failures and extra reworks in user level tools
(makedumpfile, crash, etc.) which treat vmalloc and module
addresses similarily;
- kernel needs to be the last area in the virtual memory
layout to easily distinguish between kernel and non-kernel
virtual addresses. That is needed to (again) simplify
handling of addresses in user level tools and make __pa()
macro faster (see below);
Concluding the above, the relative order of the considered
virtual areas in memory is: vmalloc - modules - kernel.
Therefore, the only change to the current memory layout is
moving kernel to the end of virtual address space.
With that approach the implementation of __pa() macro is
straightforward - all linear virtual addresses less than
kernel base are considered identity mapping:
phys == virt - __identity_base
All addresses greater than kernel base are kernel ones:
phys == (kvirt - __kaslr_offset) + __kaslr_offset_phys
By contrast, __va() macro deals only with identity mapping
addresses:
virt == phys + __identity_base
.amode31 section is mapped separately and is not covered by
__pa() macro. In fact, it could have been handled easily by
checking whether a virtual address is within the section or
not, but there is no need for that. Thus, let __pa() code
do as little machine cycles as possible.
The KASAN shadow memory is located at the very end of the
virtual memory layout, at addresses higher than the kernel.
However, that is not a linear mapping and no code other than
KASAN instrumentation or API is expected to access it.
When KASLR mode is enabled the kernel base address randomized
within a memory window that spans whole unused virtual address
space. The size of that window depends from the amount of
physical memory available to the system, the limit imposed by
UltraVisor (if present) and the vmalloc area size as provided
by vmalloc= kernel command line parameter.
In case the virtual memory is exhausted the minimum size of
the randomization window is forcefully set to 2GB, which
amounts to in 15 bits of entropy if KASAN is enabled or 17
bits of entropy in default configuration.
The default kernel offset 0x100000 is used as a magic value
both in the decompressor code and vmlinux linker script, but
it will be removed with a follow-up change.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Make the type of __vmlinux_relocs_64_start|end symbols as
char array, just like it is done for all other sections.
Function rescue_relocs() is simplified as result.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The end of GOT is calculated dynamically on boot. The size of GOT
is calculated on build from the start and end of GOT. Avoid both
calculations and use the end of GOT directly.
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
On s390, currently kernel uses the '-fPIE' compiler flag for compiling
vmlinux. This has a few problems:
- It uses dynamic symbols (.dynsym), for which the linker refuses to
allow more than 64k sections. This can break features which use
'-ffunction-sections' and '-fdata-sections', including kpatch-build
[1] and Function Granular KASLR.
- It unnecessarily uses GOT relocations, adding an extra layer of
indirection for many memory accesses.
Instead of using '-fPIE', resolve all the relocations at link time and
then manually adjust any absolute relocations (R_390_64) during boot.
This is done by first telling the linker to preserve all relocations
during the vmlinux link. (Note this is harmless: they are later
stripped in the vmlinux.bin link.)
Then use the 'relocs' tool to find all absolute relocations (R_390_64)
which apply to allocatable sections. The offsets of those relocations
are saved in a special section which is then used to adjust the
relocations during boot.
(Note: For some reason, Clang occasionally creates a GOT reference, even
without '-fPIE'. So Clang-compiled kernels have a GOT, which needs to
be adjusted.)
On my mostly-defconfig kernel, this reduces kernel text size by ~1.3%.
[1] https://github.com/dynup/kpatch/issues/1284
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html
Compiler consideration:
Gcc recently implemented an optimization [2] for loading symbols without
explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates
symbols to reside on a 2-byte boundary, enabling the use of the larl
instruction. However, kernel linker scripts may still generate unaligned
symbols. To address this, a new -munaligned-symbols option has been
introduced [3] in recent gcc versions. This option has to be used with
future gcc versions.
Older Clang lacks support for handling unaligned symbols generated
by kernel linker scripts when the kernel is built without -fPIE. However,
future versions of Clang will include support for the -munaligned-symbols
option. When the support is unavailable, compile the kernel with -fPIE
to maintain the existing behavior.
In addition to it:
move vmlinux.relocs to safe relocation
When the kernel is built with CONFIG_KERNEL_UNCOMPRESSED, the entire
uncompressed vmlinux.bin is positioned in the bzImage decompressor
image at the default kernel LMA of 0x100000, enabling it to be executed
in-place. However, the size of .vmlinux.relocs could be large enough to
cause an overlap with the uncompressed kernel at the address 0x100000.
To address this issue, .vmlinux.relocs is positioned after the
.rodata.compressed in the bzImage. Nevertheless, in this configuration,
vmlinux.relocs will overlap with the .bss section of vmlinux.bin. To
overcome that, move vmlinux.relocs to a safe location before clearing
.bss and handling relocs.
Compile warning fix from Sumanth Korikkar:
When kernel is built with CONFIG_LD_ORPHAN_WARN and -fno-PIE, there are
several warnings:
ld: warning: orphan section `.rela.iplt' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.head.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.init.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.rodata.cst8' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
Orphan sections are sections that exist in an object file but don't have
a corresponding output section in the final executable. ld raises a
warning when it identifies such sections.
Eliminate the warning by placing all .rela orphan sections in .rela.dyn
and raise an error when size of .rela.dyn is greater than zero. i.e.
Dont just neglect orphan sections.
This is similar to adjustment performed in x86, where kernel is built
with -fno-PIE.
commit 5354e84598f2 ("x86/build: Add asserts for unwanted sections")
[sumanthk@linux.ibm.com: rebased Josh Poimboeuf patches and move
vmlinux.relocs to safe location]
[hca@linux.ibm.com: merged compile warning fix from Sumanth]
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219132734.22881-4-sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20240219132734.22881-5-sumanthk@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Improve the distribution algorithm of random base address to ensure
a uniformity among all suitable addresses. To generate a random value
once, and to build a continuous range in which every value is suitable,
count all the suitable addresses (referred to as positions) that can be
used as a base address. The positions are counted by iterating over the
usable memory ranges. For each range that is big enough to accommodate
the image, count all the suitable addresses where the image can be placed,
while taking reserved memory ranges into consideration.
A new function "iterate_valid_positions()" has dual purpose. Firstly, it
is called to count the positions in a given memory range, and secondly,
to convert a random position back to an address.
"get_random_base()" has been replaced with more generic
"randomize_within_range()" which now could be called for randomizing
base addresses not just for the kernel image.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Just like other architectures provide a kaslr_enabled() function, instead
of directly accessing a global variable.
Also pass the renamed __kaslr_enabled variable from the decompressor to the
kernel, so that kalsr_enabled() is available there too. This will be used
by a subsequent patch which randomizes the module base load address.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Since regular paging structs are initialized in decompressor already
move KASAN shadow mapping to decompressor as well. This helps to avoid
allocating KASAN required memory in 1 large chunk, de-duplicate paging
structs creation code and start the uncompressed kernel with KASAN
instrumentation right away. This also allows to avoid all pitfalls
accidentally calling KASAN instrumented code during KASAN initialization.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Currently several approaches for finding unused memory in decompressor
are utilized. While "safe_addr" grows towards higher addresses, vmem
code allocates paging structures top down. The former requires careful
ordering. In addition to that ipl report handling code verifies potential
intersections with secure boot certificates on its own. Neither of two
approaches are memory holes aware and consistent with each other in low
memory conditions.
To solve that, existing approaches are generalized and combined
together, as well as online memory ranges are now taken into
consideration.
physmem_info has been extended to contain reserved memory ranges. New
set of functions allow to handle reserves and find unused memory.
All reserves and memory allocations are "typed". In case of out of
memory condition decompressor fails with detailed info on current
reserved ranges and usable online memory.
Linux version 6.2.0 ...
Kernel command line: ... mem=100M
Our of memory allocating 100000 bytes 100000 aligned in range 0:5800000
Reserved memory ranges:
0000000000000000 0000000003e33000 DECOMPRESSOR
0000000003f00000 00000000057648a3 INITRD
00000000063e0000 00000000063e8000 VMEM
00000000063eb000 00000000063f4000 VMEM
00000000063f7800 0000000006400000 VMEM
0000000005800000 0000000006300000 KASAN
Usable online memory ranges (info source: sclp read info [3]):
0000000000000000 0000000006400000
Usable online memory total: 6400000 Reserved: 61b10a3 Free: 24ef5d
Call Trace:
(sp:000000000002bd58 [<0000000000012a70>] physmem_alloc_top_down+0x60/0x14c)
sp:000000000002bdc8 [<0000000000013756>] _pa+0x56/0x6a
sp:000000000002bdf0 [<0000000000013bcc>] pgtable_populate+0x45c/0x65e
sp:000000000002be90 [<00000000000140aa>] setup_vmem+0x2da/0x424
sp:000000000002bec8 [<0000000000011c20>] startup_kernel+0x428/0x8b4
sp:000000000002bf60 [<00000000000100f4>] startup_normal+0xd4/0xd4
physmem_alloc_range allows to find free memory in specified range. It
should be used for one time allocations only like finding position for
amode31 and vmlinux.
physmem_alloc_top_down can be used just like physmem_alloc_range, but
it also allows multiple allocations per type and tries to merge sequential
allocations together. Which is useful for paging structures allocations.
If sequential allocations cannot be merged together they are "chained",
allowing easy per type reserved ranges enumeration and migration to
memblock later. Extra "struct reserved_range" allocated for chaining are
not tracked or reserved but rely on the fact that both
physmem_alloc_range and physmem_alloc_top_down search for free memory
only below current top down allocator position. All reserved ranges
should be transferred to memblock before memblock allocations are
enabled.
The startup code has been reordered to delay any memory allocations until
online memory ranges are detected and occupied memory ranges are marked as
reserved to be excluded from follow-up allocations.
Ipl report certificates are a special case, ipl report certificates list
is checked together with other memory reserves until certificates are
saved elsewhere.
KASAN required memory for shadow memory allocation and mapping is reserved
as 1 large chunk which is later passed to KASAN early initialization code.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
In preparation to extending mem_detect with additional information like
reserved ranges rename it to more generic physmem_info. This new naming
also help to avoid confusion by using more exact terms like "physmem
online ranges", etc.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Commit bf64f0517e5d ("s390/mem_detect: handle online memory limit
just once") introduced truncation of mem_detect online ranges
based on identity mapping size. For kdump case however the full
set of online memory ranges has to be feed into memblock_physmem_add
so that crashed system memory could be extracted.
Instead of truncating introduce a "usable limit" which is respected by
mem_detect api. Also add extra online memory ranges iterator which still
provides full set of online memory ranges disregarding the "usable limit".
Fixes: bf64f0517e5d ("s390/mem_detect: handle online memory limit just once")
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
If kernel is build without KASAN support there is a chance that kernel
image is going to be positioned by KASLR code to overlap with identity
mapping page tables.
When kernel is build with KASAN support enabled memory which
is potentially going to be used for page tables and KASAN
shadow mapping is accounted for in KASLR with the use of
kasan_estimate_memory_needs(). Split this function and introduce
vmem_estimate_memory_needs() to cover decompressor's vmem identity
mapping page tables.
Fixes: bb1520d581a3 ("s390/mm: start kernel with DAT enabled")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Introduce mem_detect_truncate() to cut any online memory ranges above
established identity mapping size, so that mem_detect users wouldn't
have to do it over and over again.
Suggested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Allocation of mem_detect extended area was not considered neither
in commit 9641b8cc733f ("s390/ipl: read IPL report at early boot")
nor in commit b2d24b97b2a9 ("s390/kernel: add support for kernel address
space layout randomization (KASLR)"). As a result mem_detect extended
theoretically may overlap with ipl report or randomized kernel image
position. But as mem_detect code will allocate extended area only
upon exceeding 255 online regions (which should alternate with offline
memory regions) it is not seen in practice.
To make sure mem_detect extended area does not overlap with ipl report
or randomized kernel position extend usage of "safe_addr". Make initrd
handling and mem_detect extended area allocation code move it further
right and make KASLR takes in into consideration as well.
Fixes: 9641b8cc733f ("s390/ipl: read IPL report at early boot")
Fixes: b2d24b97b2a9 ("s390/kernel: add support for kernel address space layout randomization (KASLR)")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Commit bb1520d581a3 ("s390/mm: start kernel with DAT enabled")
doesn't consider online memory holes due to potential memory offlining
and erroneously creates pgtables for stand-by memory, which bear RW+X
attribute and trigger a warning:
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x0000000c3fffffff 49G online yes 0-48
0x0000000c40000000-0x0000000c7fffffff 1G offline 49
0x0000000c80000000-0x0000000fffffffff 14G online yes 50-63
0x0000001000000000-0x00000013ffffffff 16G offline 64-79
s390/mm: Found insecure W+X mapping at address 0xc40000000
WARNING: CPU: 14 PID: 1 at arch/s390/mm/dump_pagetables.c:142 note_page+0x2cc/0x2d8
Map only online memory ranges which fit within identity mapping limit.
Fixes: bb1520d581a3 ("s390/mm: start kernel with DAT enabled")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Move Absolute Lowcore Area allocation to the decompressor.
As result, get_abs_lowcore() and put_abs_lowcore() access
brackets become really straight and do not require complex
execution context analysis and LAP and interrupts tackling.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The setup of the kernel virtual address space is spread
throughout the sources, boot stages and config options
like this:
1. The available physical memory regions are queried
and stored as mem_detect information for later use
in the decompressor.
2. Based on the physical memory availability the virtual
memory layout is established in the decompressor;
3. If CONFIG_KASAN is disabled the kernel paging setup
code populates kernel pgtables and turns DAT mode on.
It uses the information stored at step [1].
4. If CONFIG_KASAN is enabled the kernel early boot
kasan setup populates kernel pgtables and turns DAT
mode on. It uses the information stored at step [1].
The kasan setup creates early_pg_dir directory and
directly overwrites swapper_pg_dir entries to make
shadow memory pages available.
Move the kernel virtual memory setup to the decompressor
and start the kernel with DAT turned on right from the
very first istruction. That completely eliminates the
boot phase when the kernel runs in DAT-off mode, simplies
the overall design and consolidates pgtables setup.
The identity mapping is created in the decompressor, while
kasan shadow mappings are still created by the early boot
kernel code.
Share with decompressor the existing kasan memory allocator.
It decreases the size of a newly requested memory block from
pgalloc_pos and ensures that kernel image is not overwritten.
pgalloc_low and pgalloc_pos pointers are made preserved boot
variables for that.
Use the bootdata infrastructure to setup swapper_pg_dir
and invalid_pg_dir directories used by the kernel later.
The interim early_pg_dir directory established by the
kasan initialization code gets eliminated as result.
As the kernel runs in DAT-on mode only the PSW_KERNEL_BITS
define gets PSW_MASK_DAT bit by default. Additionally, the
setup_lowcore_dat_off() and setup_lowcore_dat_on() routines
get merged, since there is no DAT-off mode stage anymore.
The memory mappings are created with RW+X protection that
allows the early boot code setting up all necessary data
and services for the kernel being booted. Just before the
paging is enabled the memory protection is changed to
RO+X for text, RO+NX for read-only data and RW+NX for
kernel data and the identity mapping.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Detect and enable memory facilities which is a
prerequisite for pgtables setup in the decompressor.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Move declarations to appropriate header files. Instead of cryptic
casting directly assign struct vmlinux_info type to _vmlinux_info
linker script variable - wich it actually is.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Convert initial lowcore to C and use proper defines and structures to
initialize it. This should make the z/VM ipl procedure a bit less magic.
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This change simplifies the task of making the decompressor relocatable.
The decompressor's image contains special DMA sections between _sdma and
_edma. This DMA segment is loaded at boot as part of the decompressor and
then simply handed over to the decompressed kernel. The decompressor itself
never uses it in any way. The primary reason for this is the need to keep
the aforementioned DMA segment below 2GB which is required by architecture,
and because the decompressor is always loaded at a fixed low physical
address, it is guaranteed that the DMA region will not cross the 2GB
memory limit. If the DMA region had been placed in the decompressed kernel,
then KASLR would make this guarantee impossible to fulfill or it would
be restricted to the first 2GB of memory address space.
This commit moves all DMA sections between _sdma and _edma from
the decompressor's image to the decompressed kernel's image. The complete
DMA region is placed in the init section of the decompressed kernel and
immediately relocated below 2GB at start-up before it is needed by other
parts of the decompressed kernel. The relocation of the DMA region happens
even if the decompressed kernel is already located below 2GB in order
to keep the first implementation simple. The relocation should not have
any noticeable impact on boot time because the DMA segment is only a couple
of pages.
After relocating the DMA sections, the kernel has to fix all references
which point into it. In order to automate this, place all variables
pointing into the DMA sections in a special .dma.refs section. All such
variables must be defined using the new __dma_ref macro. Only variables
containing addresses within the DMA sections must be placed in the new
.dma.refs section.
Furthermore, move the initialization of control registers from
the decompressor to the decompressed kernel because some control registers
reference tables that must be placed in the DMA data section to
guarantee that their addresses are below 2G. Because the decompressed
kernel relocates the DMA sections at startup, the content of control
registers CR2, CR5 and CR15 must be updated with new addresses after
the relocation. The decompressed kernel initializes all control registers
early at boot and then updates the content of CR2, CR5 and CR15
as soon as the DMA relocation has occurred. This practically reverts
the commit a80313ff91ab ("s390/kernel: introduce .dma sections").
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
To make the decompressor relocatable, the early SCLP buffer with a fixed
address must be replaced with a relocatable C buffer of the according size
and alignment as required by SCLP.
Introduce a new function sclp_early_set_buffer() into the SCLP driver
which enables the decompressor to change the SCLP early buffer at any time.
This will be useful when the decompressor becomes fully relocatable and
might need to change the SCLP early buffer to one with an address < 2G
as required by SCLP because it was loaded at an address >= 2G.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Instead of using constant addresses for the normal and dump-info stacks,
allocate both stacks in the decompressor's image and load the stack register
in a position-independent manner.
This will allow loading and entering the decompressor at an arbitrary
memory address without corrupting the content at the fixed addresses
used until now for both stacks. This is one of the prerequisites
for being able to kexec the decompressor from its load address without
relocating it first.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
To prevent multiple incompatible declarations of symbols and to catch
such mistakes at compile time.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Currently there are two separate places where kernel memory layout has
to be known and adjusted:
1. early kasan setup.
2. paging setup later.
Those 2 places had to be kept in sync and adjusted to reflect peculiar
technical details of one another. With additional factors which influence
kernel memory layout like ultravisor secure storage limit, complexity
of keeping two things in sync grew up even more.
Besides that if we look forward towards creating identity mapping and
enabling DAT before jumping into uncompressed kernel - that would also
require full knowledge of and control over kernel memory layout.
So, de-duplicate and move kernel memory layout setup logic into
the decompressor.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Decompressor works on a single statically allocated stack. Stacktrace
implementation with -mbackchain just takes few lines of code.
Linux version 5.10.0-rc3-22793-g0f84a355b776-dirty (gor@tuxmaker) #27 SMP PREEMPT Mon Nov 9 17:30:18 CET 2020
Kernel fault: interruption code 0005 ilc:2
PSW : 0000000180000000 0000000000012f92 (parse_boot_command_line+0x27a/0x46c)
R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:0 PM:0 RI:0 EA:3
GPRS: 0000000000000000 00ffffffffffffff 0000000000000000 000000000001a62c
000000000000bf60 0000000000000000 00000000000003c0 0000000000000000
0000000000000080 000000000002322d 000000007f29ef20 0000000000efd018
000000000311c000 0000000000010070 0000000000012f82 000000000000bea8
Call Trace:
(sp:000000000000bea8 [<000000000002016e>] 000000000002016e)
sp:000000000000bf18 [<0000000000012408>] startup_kernel+0x88/0x2fc
sp:000000000000bf60 [<00000000000100c4>] startup_normal+0xb0/0xb0
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The decompressor does not have any special debug means. Running the
kernel under qemu with gdb is helpful but tedious exercise if done
repeatedly. It is also not applicable to debugging under LPAR and z/VM.
One special thing which stands out is a working sclp_early_printk,
which could be used once the kernel switches to 64-bit addressing mode.
But sclp_early_printk does not provide any string formating capabilities.
Formatting and printing string without printk-alike function is a
not fun. The lack of printk-alike function means people would save up on
testing and introduce more bugs.
So, finally, provide decompressor_printk function, which fits on one
screen and trades features for simplicity.
It only supports "%s", "%x" and "%lx" specifiers and zero padding for
hex values.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Currently we have to consider too many different values which
in the end only affect identity mapping size. These are:
1. max_physmem_end - end of physical memory online or standby.
Always <= end of the last online memory block (get_mem_detect_end()).
2. CONFIG_MAX_PHYSMEM_BITS - the maximum size of physical memory the
kernel is able to support.
3. "mem=" kernel command line option which limits physical memory usage.
4. OLDMEM_BASE which is a kdump memory limit when the kernel is executed as
crash kernel.
5. "hsa" size which is a memory limit when the kernel is executed during
zfcp/nvme dump.
Through out kernel startup and run we juggle all those values at once
but that does not bring any amusement, only confusion and complexity.
Unify all those values to a single one we should really care, that is
our identity mapping size.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
To make sure that the vmalloc area size is for almost all cases large
enough let it depend on the (potential) physical memory size. There is
still the possibility to override this with the vmalloc kernel command
line parameter.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for IBM z15 machines.
- Add SHA3 and CCA AES cipher key support in zcrypt and pkey
refactoring.
- Move to arch_stack_walk infrastructure for the stack unwinder.
- Various kasan fixes and improvements.
- Various command line parsing fixes.
- Improve decompressor phase debuggability.
- Lift no bss usage restriction for the early code.
- Use refcount_t for reference counters for couple of places in mm
code.
- Logging improvements and return code fix in vfio-ccw code.
- Couple of zpci fixes and minor refactoring.
- Remove some outdated documentation.
- Fix secure boot detection.
- Other various minor code clean ups.
* tag 's390-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
s390: remove pointless drivers-y in drivers/s390/Makefile
s390/cpum_sf: Fix line length and format string
s390/pci: fix MSI message data
s390: add support for IBM z15 machines
s390/crypto: Support for SHA3 via CPACF (MSA6)
s390/startup: add pgm check info printing
s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
vfio-ccw: fix error return code in vfio_ccw_sch_init()
s390: vfio-ap: fix warning reset not completed
s390/base: remove unused s390_base_mcck_handler
s390/sclp: Fix bit checked for has_sipl
s390/zcrypt: fix wrong handling of cca cipher keygenflags
s390/kasan: add kdump support
s390/setup: avoid using strncmp with hardcoded length
s390/sclp: avoid using strncmp with hardcoded length
s390/module: avoid using strncmp with hardcoded length
s390/pci: avoid using strncmp with hardcoded length
s390/kaslr: reserve memory for kasan usage
s390/mem_detect: provide single get_mem_detect_end
s390/cmma: reuse kstrtobool for option value parsing
...
|
|
Try to print out startup pgm check info including exact linux kernel
version, pgm interruption code and ilc, psw and general registers. Like
the following:
Linux version 5.3.0-rc7-07282-ge7b4d41d61bd-dirty (gor@tuxmaker) #3 SMP PREEMPT Thu Sep 5 16:07:34 CEST 2019
Kernel fault: interruption code 0005 ilc:2
PSW : 0000000180000000 0000000000012e52
R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:0 PM:0 RI:0 EA:3
GPRS: 0000000000000000 00ffffffffffffff 0000000000000000 0000000000019a58
000000000000bf68 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 000000000001a041 0000000000000000
0000000004c9c000 0000000000010070 0000000000012e42 000000000000beb0
This info makes it apparent that kernel startup failed and might help
to understand what went wrong without actual standalone dump.
Printing code runs on its own stack of 1 page (at unused 0x5000), which
should be sufficient for sclp_early_printk usage (typical stack usage
observed has been around 512 bytes).
The code has pgm check recursion prevention, despite pgm check info
printing failure (follow on pgm check) or success it restores original
faulty psw and gprs and does disabled wait.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|