summaryrefslogtreecommitdiff
path: root/arch/s390/kernel/setup.c
AgeCommit message (Collapse)Author
2021-08-26s390/smp: enable DAT before CPU restart callback is calledAlexander Gordeev
The restart interrupt is triggered whenever a secondary CPU is brought online, a remote function call dispatched from another CPU or a manual PSW restart is initiated and causes the system to kdump. The handling routine is always called with DAT turned off. It then initializes the stack frame and invokes a callback. The existing callbacks handle DAT as follows: * __do_restart() and __machine_kexec() turn in on upon entry; * __ipl_run(), __reipl_run() and __dump_run() do not turn it right away, but all of them call diag308() - which turns DAT on, but only if kasan is enabled; In addition to the described complexity all callbacks (and the functions they call) should avoid kasan instrumentation while DAT is off. This update enables DAT in the assembler restart handler and relieves any callbacks (which are mostly C functions) from dealing with DAT altogether. There are four types of CPU restart that initialize control registers in different ways: 1. Start of secondary CPU on boot - control registers are inherited from the IPL CPU; 2. Restart of online CPU - control registers of the CPU being restarted are kept; 3. Hotplug of offline CPU - control registers are inherited from the starting CPU; 4. Start of offline CPU triggered by manual PSW restart - the control registers are read from the absolute lowcore and contain the boot time IPL CPU values updated with all follow-up calls of smp_ctl_set_bit() and smp_ctl_clear_bit() routines; In first three cases contents of the control registers is the most recent. In the latter case control registers are good enough to facilitate successful completion of kdump operation. Suggested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-18s390/sclp: reserve memory occupied by sclp early bufferAlexander Egorenkov
The memory block occupied by the SCLP early buffer that is allocated by the decompressor and then handed over to the decompressed kernel, must be reserved to prevent it from being reused for other purposes. This is necessary because the SCLP early buffer is still in use during kernel initialization. Fixes: f1d3c5323772 ("s390/boot: move sclp early buffer from fixed address in asm to C") Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reported-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-08-05s390: rename dma section to amode31Heiko Carstens
The dma section name is confusing, since the code which resides within that section has nothing to do with direct memory access. Instead the limitation is that the code has to run in 31 bit addressing mode, and therefore has to reside below 2GB. So the name was chosen since ZONE_DMA is the same region. To reduce confusion rename the section to amode31, which hopefully describes better what this is about. Note: this will also change vmcoreinfo strings - SDMA=... gets renamed to SAMODE31=... - EDMA=... gets renamed to EAMODE31=... Acked-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27s390/hwcaps: move setup_hwcaps()Heiko Carstens
Move setup_hwcaps() to processor.c for two reasons: - make setup.c a bit smaller - have allmost all of the hwcap code in one file Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27s390/hwcaps: shorten HWCAP definesHeiko Carstens
Remove s390 part of all HWCAP defines, just to make them shorter and easier to handle. The namespace is anyway per architecture. This is similar to what arm64 has. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27s390: add HWCAP_S390_PCI_MIO to ELF hwcapsNiklas Schnelle
In order to support the use of enhanced PCI instructions in both kernel- and userspace we need both hardware support and proper setup in the kernel. The latter can be toggled off with the pci=nomio command line option. Thus availability of this feature in userspace depends on all of kernel configuration (CONFIG_PCI), hardware support and the current kernel command line and can thus not rely solely on a facility bit. Instead let's introduce a new ELF hardware capability bit HWCAP_S390_PCI_MIO to tell userspace whether these PCI instructions can be used. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27s390: report more CPU capabilitiesHeiko Carstens
Add hardware capability bits and feature tags to /proc/cpuinfo for NNPA and Vector-Packed-Decimal-Enhancement Facility 2. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27s390/setup: don't reserve memory that occupied decompressor's headAlexander Egorenkov
There is no useful information within [STARTUP_NORMAL_OFFSET, HEAD_END] now. But the memory region [0, STARTUP_NORMAL_OFFSET] is used by: * lowcore * kdump for swapping memory * stand-alone zipl dumpers for code, data, stack and heap Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27s390/boot: move dma sections from decompressor to decompressed kernelAlexander Egorenkov
This change simplifies the task of making the decompressor relocatable. The decompressor's image contains special DMA sections between _sdma and _edma. This DMA segment is loaded at boot as part of the decompressor and then simply handed over to the decompressed kernel. The decompressor itself never uses it in any way. The primary reason for this is the need to keep the aforementioned DMA segment below 2GB which is required by architecture, and because the decompressor is always loaded at a fixed low physical address, it is guaranteed that the DMA region will not cross the 2GB memory limit. If the DMA region had been placed in the decompressed kernel, then KASLR would make this guarantee impossible to fulfill or it would be restricted to the first 2GB of memory address space. This commit moves all DMA sections between _sdma and _edma from the decompressor's image to the decompressed kernel's image. The complete DMA region is placed in the init section of the decompressed kernel and immediately relocated below 2GB at start-up before it is needed by other parts of the decompressed kernel. The relocation of the DMA region happens even if the decompressed kernel is already located below 2GB in order to keep the first implementation simple. The relocation should not have any noticeable impact on boot time because the DMA segment is only a couple of pages. After relocating the DMA sections, the kernel has to fix all references which point into it. In order to automate this, place all variables pointing into the DMA sections in a special .dma.refs section. All such variables must be defined using the new __dma_ref macro. Only variables containing addresses within the DMA sections must be placed in the new .dma.refs section. Furthermore, move the initialization of control registers from the decompressor to the decompressed kernel because some control registers reference tables that must be placed in the DMA data section to guarantee that their addresses are below 2G. Because the decompressed kernel relocates the DMA sections at startup, the content of control registers CR2, CR5 and CR15 must be updated with new addresses after the relocation. The decompressed kernel initializes all control registers early at boot and then updates the content of CR2, CR5 and CR15 as soon as the DMA relocation has occurred. This practically reverts the commit a80313ff91ab ("s390/kernel: introduce .dma sections"). Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27s390/dump: introduce boot data 'oldmem_data'Alexander Egorenkov
The new boot data struct shall replace global variables OLDMEM_BASE and OLDMEM_SIZE. It is initialized in the decompressor and passed to the decompressed kernel. In comparison to the old solution, this one doesn't access data at fixed physical addresses which will become important when the decompressor becomes relocatable. Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27s390/boot: introduce boot data 'initrd_data'Alexander Egorenkov
The new boot data struct shall replace global variables INITRD_START and INITRD_SIZE. It is initialized in the decompressor and passed to the decompressed kernel. In comparison to the old solution, this one doesn't access data at fixed physical addresses which will become important when the decompressor becomes relocatable. Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-10Merge tag 's390-5.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Vasily Gorbik: - Fix preempt_count initialization. - Rework call_on_stack() macro to add proper type handling and avoid possible register corruption. - More error prone "register asm" removal and fixes. - Fix syscall restarting when multiple signals are coming in. This adds minimalistic trampolines to vdso so we can return from signal without using the stack which requires pgm check handler hacks when NX is enabled. - Remove HAVE_IRQ_EXIT_ON_IRQ_STACK since this is no longer true after switch to generic entry. - Fix protected virtualization secure storage access exception handling. - Make machine check C handler always enter with DAT enabled and move register validation to C code. - Fix tinyconfig boot problem by avoiding MONITOR CALL without CONFIG_BUG. - Increase asm symbols alignment to 16 to make it consistent with compilers. - Enable concurrent access to the CPU Measurement Counter Facility. - Add support for dynamic AP bus size limit and rework ap_dqap to deal with messages greater than recv buffer. * tag 's390-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits) s390: preempt: Fix preempt_count initialization s390/linkage: increase asm symbols alignment to 16 s390: rename CALL_ON_STACK_NORETURN() to call_on_stack_noreturn() s390: add type checking to CALL_ON_STACK_NORETURN() macro s390: remove old CALL_ON_STACK() macro s390/softirq: use call_on_stack() macro s390/lib: use call_on_stack() macro s390/smp: use call_on_stack() macro s390/kexec: use call_on_stack() macro s390/irq: use call_on_stack() macro s390/mm: use call_on_stack() macro s390: introduce proper type handling call_on_stack() macro s390/irq: simplify on_async_stack() s390/irq: inline do_softirq_own_stack() s390/irq: simplify do_softirq_own_stack() s390/ap: get rid of register asm in ap_dqap() s390: rename PIF_SYSCALL_RESTART to PIF_EXECVE_PGSTE_RESTART s390: move restart of execve() syscall s390/signal: remove sigreturn on stack s390/signal: switch to using vdso for sigreturn and syscall restart ...
2021-07-08s390: preempt: Fix preempt_count initializationValentin Schneider
S390's init_idle_preempt_count(p, cpu) doesn't actually let us initialize the preempt_count of the requested CPU's idle task: it unconditionally writes to the current CPU's. This clearly conflicts with idle_threads_init(), which intends to initialize *all* the idle tasks, including their preempt_count (or their CPU's, if the arch uses a per-CPU preempt_count). Unfortunately, it seems the way s390 does things doesn't let us initialize every possible CPU's preempt_count early on, as the pages where this resides are only allocated when a CPU is brought up and are freed when it is brought down. Let the arch-specific code set a CPU's preempt_count when its lowcore is allocated, and turn init_idle_preempt_count() into an empty stub. Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210707163338.1623014-1-valentin.schneider@arm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390: rename CALL_ON_STACK_NORETURN() to call_on_stack_noreturn()Heiko Carstens
Lower case matches the call_on_stack() macro and is easier to read. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390: convert to setup_initial_init_mm()Kefeng Wang
Use setup_initial_init_mm() helper to simplify code. Link: https://lkml.kernel.org/r/20210608083418.137226-14-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-18s390/setup: cleanup reserve/remove_oldmemVasily Gorbik
Since OLDMEM_BASE/OLDMEM_SIZE is already taken into consideration and is reflected in ident_map_size. reserve/remove_oldmem() is no longer needed and could be removed. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-18s390: setup kernel memory layout earlyVasily Gorbik
Currently there are two separate places where kernel memory layout has to be known and adjusted: 1. early kasan setup. 2. paging setup later. Those 2 places had to be kept in sync and adjusted to reflect peculiar technical details of one another. With additional factors which influence kernel memory layout like ultravisor secure storage limit, complexity of keeping two things in sync grew up even more. Besides that if we look forward towards creating identity mapping and enabling DAT before jumping into uncompressed kernel - that would also require full knowledge of and control over kernel memory layout. So, de-duplicate and move kernel memory layout setup logic into the decompressor. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07s390/smp: reallocate IPL CPU lowcoreAlexander Gordeev
The lowcore for IPL CPU is special. It is allocated early in the boot process using memblock and never freed since. The reason is pcpu_alloc_lowcore() and pcpu_free_lowcore() routines use page allocator which is not available when the IPL CPU is getting initialized. Similar problem is already addressed for stacks - once the virtual memory is available the early boot stacks get re- allocated. Doing the same for lowcore will allow freeing the IPL CPU lowcore and make no difference between the boot and secondary CPUs. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07s390/sclp_vt220: fix console name to match deviceValentin Vidic
Console name reported in /proc/consoles: ttyS1 -W- (EC p ) 4:65 does not match the char device name: crw--w---- 1 root root 4, 65 May 17 12:18 /dev/ttysclp0 so debian-installer inside a QEMU s390x instance gets confused and fails to start with the following error: steal-ctty: No such file or directory Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr> Link: https://lore.kernel.org/r/20210427194010.9330-1-vvidic@valentin-vidic.from.hr Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07s390/facilities: move stfl information from lowcore to global dataSven Schnelle
With gcc-11, there are a lot of warnings because the facility functions are accessing lowcore through a null pointer. Fix this by moving the facility arrays away from lowcore. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-05-04s390: fix detection of vector enhancements facility 1 vs. vector packed ↵David Hildenbrand
decimal facility The PoP documents: 134: The vector packed decimal facility is installed in the z/Architecture architectural mode. When bit 134 is one, bit 129 is also one. 135: The vector enhancements facility 1 is installed in the z/Architecture architectural mode. When bit 135 is one, bit 129 is also one. Looks like we confuse the vector enhancements facility 1 ("EXT") with the Vector packed decimal facility ("BCD"). Let's fix the facility checks. Detected while working on QEMU/tcg z14 support and only unlocking the vector enhancements facility 1, but not the vector packed decimal facility. Fixes: 2583b848cad0 ("s390: report new vector facilities") Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20210503121244.25232-1-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-07s390/setup: use memblock_free_late() to free old stackHeiko Carstens
Use memblock_free_late() to free the old machine check stack to the buddy allocator instead of leaking it. Fixes: b61b1595124a ("s390: add stack for machine check handler") Cc: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-02-13s390: add stack for machine check handlerSven Schnelle
The previous code used the normal kernel stack for machine checks. This is problematic when a machine check interrupts a system call or interrupt handler right at the beginning where registers are set up. Assume system_call is interrupted at the first instruction and a machine check is triggered. The machine check handler is called, checks the PSW to see whether it is coming from user space, notices that it is already in kernel mode but %r15 still contains the user space stack. This would lead to a kernel crash. There are basically two ways of fixing that: Either using the 'critical cleanup' approach which compares the address in the PSW to see whether it is already at a point where the stack has been set up, or use an extra stack for the machine check handler. For simplicity, we will go with the second approach and allocate an extra stack. This adds some memory overhead for large systems, but usually large system have plenty of memory so this isn't really a concern. But it keeps the mchk stack setup simple and less error prone. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Cc: <stable@kernel.org> # v5.8+ Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13s390: use WRITE_ONCE when re-allocating async stackSven Schnelle
The code does: S390_lowcore.async_stack = new + STACK_INIT_OFFSET; But the compiler is free to first assign one value and add the other value later. If a IRQ would be coming in between these two operations, it would run with an invalid stack. Prevent this by using WRITE_ONCE. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-19s390: convert to generic entrySven Schnelle
This patch converts s390 to use the generic entry infrastructure from kernel/entry/*. There are a few special things on s390: - PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't know about our PIF flags in exit_to_user_mode_loop(). - The old code had several ways to restart syscalls: a) PIF_SYSCALL_RESTART, which was only set during execve to force a restart after upgrading a process (usually qemu-kvm) to pgste page table extensions. b) PIF_SYSCALL, which is set by do_signal() to indicate that the current syscall should be restarted. This is changed so that do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use PIF_SYSCALL doesn't work with the generic code, and changing it to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART more unique. - On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by executing a svc instruction on the process stack which causes a fault. While handling that fault the fault code sets PIF_SYSCALL to hand over processing to the syscall code on exit to usermode. The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets a return value for a syscall. The s390x ptrace ABI uses r2 both for the syscall number and return value, so ptrace cannot set the syscall number + return value at the same time. The flag makes handling that a bit easier. do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET is set. CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY. CR1/7/13 will be checked both on kernel entry and exit to contain the correct asces. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-12-16s390/delay: remove udelay_simple()Heiko Carstens
udelay_simple() callers can make use of the now simplified udelay() implementation. No need to keep it. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-10s390/mm: add support to allocate gigantic hugepages using CMAGerald Schaefer
Commit cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") added support for allocating gigantic hugepages using CMA, by specifying the hugetlb_cma= kernel parameter, which will disable any boot-time allocation of gigantic hugepages. This patch enables that option also for s390. Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-20s390: unify identity mapping limits handlingVasily Gorbik
Currently we have to consider too many different values which in the end only affect identity mapping size. These are: 1. max_physmem_end - end of physical memory online or standby. Always <= end of the last online memory block (get_mem_detect_end()). 2. CONFIG_MAX_PHYSMEM_BITS - the maximum size of physical memory the kernel is able to support. 3. "mem=" kernel command line option which limits physical memory usage. 4. OLDMEM_BASE which is a kdump memory limit when the kernel is executed as crash kernel. 5. "hsa" size which is a memory limit when the kernel is executed during zfcp/nvme dump. Through out kernel startup and run we juggle all those values at once but that does not bring any amusement, only confusion and complexity. Unify all those values to a single one we should really care, that is our identity mapping size. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-09s390: make sure vmemmap is top region table entry alignedVasily Gorbik
Since commit 29d37e5b82f3 ("s390/protvirt: add ultravisor initialization") vmax is adjusted to the ultravisor secure storage limit. This limit is currently applied when 4-level paging is used. Later vmax is also used to align vmemmap address to the top region table entry border. When vmax is set to the ultravisor secure storage limit this is no longer the case. Instead of changing vmax, make only MODULES_END be affected by the secure storage limit, so that vmax stays intact for further vmemmap address alignment. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-09s390/udelay: make it work for the early codeVasily Gorbik
Currently udelay relies on working EXT interrupts handler, which is not the case during early startup. In such cases udelay_simple() has to be used instead. To avoid mistakes of calling udelay too early, which could happen from the common code as well - make udelay work for the early code by introducing static branch and redirecting all udelay calls to udelay_simple until EXT interrupts handler is fully initialized and async stack is allocated. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-10-16Merge tag 's390-5.10-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Remove address space overrides using set_fs() - Convert to generic vDSO - Convert to generic page table dumper - Add ARCH_HAS_DEBUG_WX support - Add leap seconds handling support - Add NVMe firmware-assisted kernel dump support - Extend NVMe boot support with memory clearing control and addition of kernel parameters - AP bus and zcrypt api code rework. Add adapter configure/deconfigure interface. Extend debug features. Add failure injection support - Add ECC secure private keys support - Add KASan support for running protected virtualization host with 4-level paging - Utilize destroy page ultravisor call to speed up secure guests shutdown - Implement ioremap_wc() and ioremap_prot() with MIO in PCI code - Various checksum improvements - Other small various fixes and improvements all over the code * tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits) s390/uaccess: fix indentation s390/uaccess: add default cases for __put_user_fn()/__get_user_fn() s390/zcrypt: fix wrong format specifications s390/kprobes: move insn_page to text segment s390/sie: fix typo in SIGP code description s390/lib: fix kernel doc for memcmp() s390/zcrypt: Introduce Failure Injection feature s390/zcrypt: move ap_msg param one level up the call chain s390/ap/zcrypt: revisit ap and zcrypt error handling s390/ap: Support AP card SCLP config and deconfig operations s390/sclp: Add support for SCLP AP adapter config/deconfig s390/ap: add card/queue deconfig state s390/ap: add error response code field for ap queue devices s390/ap: split ap queue state machine state from device state s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG s390/zcrypt: introduce msg tracking in zcrypt functions s390/startup: correct early pgm check info formatting s390: remove orphaned extern variables declarations s390/kasan: make sure int handler always run with DAT on s390/ipl: add support to control memory clearing for nvme re-IPL ...
2020-10-15Merge tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping updates from Christoph Hellwig: - rework the non-coherent DMA allocator - move private definitions out of <linux/dma-mapping.h> - lower CMA_ALIGNMENT (Paul Cercueil) - remove the omap1 dma address translation in favor of the common code - make dma-direct aware of multiple dma offset ranges (Jim Quinlan) - support per-node DMA CMA areas (Barry Song) - increase the default seg boundary limit (Nicolin Chen) - misc fixes (Robin Murphy, Thomas Tai, Xu Wang) - various cleanups * tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits) ARM/ixp4xx: add a missing include of dma-map-ops.h dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling dma-direct: factor out a dma_direct_alloc_from_pool helper dma-direct check for highmem pages in dma_direct_alloc_pages dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h> dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma dma-mapping: move dma-debug.h to kernel/dma/ dma-mapping: remove <asm/dma-contiguous.h> dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h> dma-contiguous: remove dma_contiguous_set_default dma-contiguous: remove dev_set_cma_area dma-contiguous: remove dma_declare_contiguous dma-mapping: split <linux/dma-mapping.h> cma: decrease CMA_ALIGNMENT lower limit to 2 firewire-ohci: use dma_alloc_pages dma-iommu: implement ->alloc_noncoherent dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods dma-mapping: add a new dma_alloc_pages API dma-mapping: remove dma_cache_sync 53c700: convert to dma_alloc_noncoherent ...
2020-10-13arch, drivers: replace for_each_membock() with for_each_mem_range()Mike Rapoport
There are several occurrences of the following pattern: for_each_memblock(memory, reg) { start = __pfn_to_phys(memblock_region_memory_base_pfn(reg); end = __pfn_to_phys(memblock_region_memory_end_pfn(reg)); /* do something with start and end */ } Using for_each_mem_range() iterator is more appropriate in such cases and allows simpler and cleaner code. [akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build] [rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring] Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13memblock: make memblock_debug and related functionality privateMike Rapoport
The only user of memblock_dbg() outside memblock was s390 setup code and it is converted to use pr_debug() instead. This allows to stop exposing memblock_debug and memblock_dbg() to the rest of the kernel. [akpm@linux-foundation.org: make memblock_dbg() safer and neater] Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-10-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-06dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h>Christoph Hellwig
Merge dma-contiguous.h into dma-map-ops.h, after removing the comment describing the contiguous allocator into kernel/dma/contigous.c. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-02s390/kasan: make sure int handler always run with DAT onVasily Gorbik
Since commit 998f5bbe3dbd ("s390/kasan: fix early pgm check handler execution") early pgm check handler is executed with DAT on if Kasan is enabled. Still there is a window between setup_lowcore_dat_off() and setup_lowcore_dat_on() when int handlers could be executed with DAT off under Kasan. If this happens the kernel ends up in pgm check loop due to Kasan shadow memory access attempts. With Kasan enabled paging is initialized much earlier and DAT flag has to be on at all times instrumented code is executed. Make sure int handlers are set up to be called with DAT on right away in this case. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-10-02s390/nvme: support firmware-assisted dump to NVMe disksAlexander Egorenkov
From the kernel perspective NVMe dump works exactly like zFCP dump. Therefore, adapt all places where code explicitly tests only for IPL of type FCP DUMP. And also set the memory end correctly in this case. Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-29s390: remove unused _swsusp_reset_dmaVasily Gorbik
Since commit 394216275c7d ("s390: remove broken hibernate / power management support") _swsusp_reset_dma is unused and could be safely removed. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16s390/kasan: support protvirt with 4-level pagingVasily Gorbik
Currently the kernel crashes in Kasan instrumentation code if CONFIG_KASAN_S390_4_LEVEL_PAGING is used on protected virtualization capable machine where the ultravisor imposes addressing limitations on the host and those limitations are lower then KASAN_SHADOW_OFFSET. The problem is that Kasan has to know in advance where vmalloc/modules areas would be. With protected virtualization enabled vmalloc/modules areas are moved down to the ultravisor secure storage limit while kasan still expects them at the very end of 4-level paging address space. To fix that make Kasan recognize when protected virtualization is enabled and predefine vmalloc/modules areas position which are compliant with ultravisor secure storage limit. Kasan shadow itself stays in place and might reside above that ultravisor secure storage limit. One slight difference compaired to a kernel without Kasan enabled is that vmalloc/modules areas position is not reverted to default if ultravisor initialization fails. It would still be below the ultravisor secure storage limit. Kernel layout with kasan, 4-level paging and protected virtualization enabled (ultravisor secure storage limit is at 0x0000800000000000): ---[ vmemmap Area Start ]--- 0x0000400000000000-0x0000400080000000 ---[ vmemmap Area End ]--- ---[ vmalloc Area Start ]--- 0x00007fe000000000-0x00007fff80000000 ---[ vmalloc Area End ]--- ---[ Modules Area Start ]--- 0x00007fff80000000-0x0000800000000000 ---[ Modules Area End ]--- ---[ Kasan Shadow Start ]--- 0x0018000000000000-0x001c000000000000 ---[ Kasan Shadow End ]--- 0x001c000000000000-0x0020000000000000 1P PGD I Kernel layout with kasan, 4-level paging and protected virtualization disabled/unsupported: ---[ vmemmap Area Start ]--- 0x0000400000000000-0x0000400060000000 ---[ vmemmap Area End ]--- ---[ Kasan Shadow Start ]--- 0x0018000000000000-0x001c000000000000 ---[ Kasan Shadow End ]--- ---[ vmalloc Area Start ]--- 0x001fffe000000000-0x001fffff80000000 ---[ vmalloc Area End ]--- ---[ Modules Area Start ]--- 0x001fffff80000000-0x0020000000000000 ---[ Modules Area End ]--- Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16s390/protvirt: parse prot_virt option in the decompressorVasily Gorbik
To make early kernel address space layout definition possible parse prot_virt option in the decompressor and pass it to the uncompressed kernel. This enables kasan to take ultravisor secure storage limit into consideration and pre-define vmalloc position correctly. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16s390/kasan: avoid unnecessary moving of vmemmapVasily Gorbik
Currently vmemmap area is unconditionally moved beyond Kasan shadow memory. When Kasan is not enabled vmemmap area position is calculated in setup_memory_end() and depends on limiting factors like ultravisor secure storage limit. Try to follow the same logic with Kasan enabled as well and avoid unnecessary vmemmap area position changes unless it really intersects with Kasan shadow. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16s390/boot: enable .bss section for compressed kernelAlexander Egorenkov
- Support static uninitialized variables in compressed kernel. - Remove chkbss script - Get rid of workarounds for not having .bss section Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14s390/mm,ptdump: add couple of additional markersVasily Gorbik
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> [hca@linux.ibm.com: add more markers, rename some markers] Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14s390/pci: Implement ioremap_wc/prot() with MIONiklas Schnelle
With our current support for the new MIO PCI instructions, write combining/write back MMIO memory can be obtained via the pci_iomap_wc() and pci_iomap_wc_range() functions. This is achieved by using the write back address for a specific bar as provided in clp_store_query_pci_fn() These functions are however not widely used and instead drivers often rely on ioremap_wc() and ioremap_prot(), which on other platforms enable write combining using a PTE flag set through the pgrprot value. While we do not have a write combining flag in the low order flag bits of the PTE like x86_64 does, with MIO support, there is a write back bit in the physical address (bit 1 on z15) and thus also the PTE. Which bit is used to toggle write back and whether it is available at all, is however not fixed in the architecture. Instead we get this information from the CLP Store Logical Processor Characteristics for PCI command. When the write back bit is not provided we fall back to the existing behavior. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14s390/init: add missing __init annotationsIlya Leoshkevich
Add __init to reserve_memory_end, reserve_oldmem and remove_oldmem. Sometimes these functions are not inlined, and then the build complains about section mismatch. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26s390: convert to GENERIC_VDSOSven Schnelle
Convert s390 to generic vDSO. There are a few special things on s390: - vDSO can be called without a stack frame - glibc did this in the past. So we need to allocate a stackframe on our own. - The former assembly code used stcke to get the TOD clock and applied time steering to it. We need to do the same in the new code. This is done in the architecture specific __arch_get_hw_counter function. The steering information is stored in an architecure specific area in the vDSO data. - CPUCLOCK_VIRT is now handled with a syscall fallback, which might be slower/less accurate than the old implementation. The getcpu() function stays as an assembly function because there is no generic implementation and the code is just a few lines. Performance number from my system do 100 mio gettimeofday() calls: Plain syscall: 8.6s Generic VDSO: 1.3s old ASM VDSO: 1s So it's a bit slower but still much faster than syscalls. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "s390: - implement diag318 x86: - Report last CPU for debugging - Emulate smaller MAXPHYADDR in the guest than in the host - .noinstr and tracing fixes from Thomas - nested SVM page table switching optimization and fixes Generic: - Unify shadow MMU cache data structures across architectures" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) KVM: SVM: Fix sev_pin_memory() error handling KVM: LAPIC: Set the TDCR settable bits KVM: x86: Specify max TDP level via kvm_configure_mmu() KVM: x86/mmu: Rename max_page_level to max_huge_page_level KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch KVM: x86: Pull the PGD's level from the MMU instead of recalculating it KVM: VMX: Make vmx_load_mmu_pgd() static KVM: x86/mmu: Add separate helper for shadow NPT root page role calc KVM: VMX: Drop a duplicate declaration of construct_eptp() KVM: nSVM: Correctly set the shadow NPT root level in its MMU role KVM: Using macros instead of magic values MIPS: KVM: Fix build error caused by 'kvm_run' cleanup KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support KVM: VMX: optimize #PF injection when MAXPHYADDR does not match KVM: VMX: Add guest physical address check in EPT violation and misconfig KVM: VMX: introduce vmx_need_pf_intercept KVM: x86: update exception bitmap on CPUID changes KVM: x86: rename update_bp_intercept to update_exception_bitmap ...
2020-08-03Merge tag 's390-5.9-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Add support for function error injection. - Add support for custom exception handlers, as required by BPF_PROBE_MEM. - Add support for BPF_PROBE_MEM. - Add trace events for idle enter / exit for the s390 specific idle implementation. - Remove unused zcore memmmap device. - Remove unused "raw view" from s390 debug feature. - AP bus + zcrypt device driver code refactoring. - Provide cex4 cca sysfs attributes for cex3 for zcrypt device driver. - Expose only minimal interface to walk physmem for mm/memblock. This is a common code change and it has been agreed on with Mike Rapoport and Andrew Morton that this can go upstream via the s390 tree. - Rework of the s390 vmem/vmmemap code to allow for future memory hot remove. - Get rid of FORCE_MAX_ZONEORDER to finally allow for order-10 allocations again, instead of only order-8 allocations. - Various small improvements and fixes. * tag 's390-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits) s390/vmemmap: coding style updates s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections s390/vmemmap: remember unused sub-pmd ranges s390/vmemmap: fallback to PTEs if mapping large PMD fails s390/vmem: cleanup empty page tables s390/vmemmap: take the vmem_mutex when populating/freeing s390/vmemmap: cleanup when vmemmap_populate() fails s390/vmemmap: extend modify_pagetable() to handle vmemmap s390/vmem: consolidate vmem_add_range() and vmem_remove_range() s390/vmem: rename vmem_add_mem() to vmem_add_range() s390: enable HAVE_FUNCTION_ERROR_INJECTION s390/pci: clarify comment in s390_mmio_read/write s390/time: improve comparison for tod steering s390/time: select CLOCKSOURCE_VALIDATE_LAST_CYCLE s390/time: use CLOCKSOURCE_MASK s390/bpf: implement BPF_PROBE_MEM s390/kernel: expand exception table logic to allow new handling options s390/kernel: unify EX_TABLE* implementations s390/mm: allow order 10 allocations s390/mm: avoid trimming to MAX_ORDER ...
2020-07-20s390/mm: avoid trimming to MAX_ORDERHeiko Carstens
Trimming to MAX_ORDER was originally done in order to avoid to set HOLES_IN_ZONE, which in turn would enable a quite expensive pfn_valid() check. pfn_valid() however only checks if a struct page exists for a given pfn. With sparsemen vmemmap there are always struct pages, since memmaps are allocated for whole sections. Therefore remove the HOLES_IN_ZONE comment and the trimming. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-06-29s390/setup: init jump labels before command line parsingVasily Gorbik
Command line parameters might set static keys. This is true for s390 at least since commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options"). To avoid the following WARN: static_key_enable_cpuslocked(): static key 'init_on_alloc+0x0/0x40' used before call to jump_label_init() call jump_label_init() just before parse_early_param(). jump_label_init() is safe to call multiple times (x86 does that), doesn't do any memory allocations and hence should be safe to call that early. Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") Cc: <stable@vger.kernel.org> # 5.3: d6df52e9996d: s390/maccess: add no DAT mode to kernel_write Cc: <stable@vger.kernel.org> # 5.3 Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>