summaryrefslogtreecommitdiff
path: root/arch/s390/kernel/asm-offsets.c
AgeCommit message (Collapse)Author
2025-04-14s390/mm: Reimplement lazy ASCE handlingHeiko Carstens
Reduce system call overhead time (round trip time for invoking a non-existent system call) by 25%. With the removal of set_fs() [1] lazy control register handling was removed in order to keep kernel entry and exit simple. However this made system calls slower. With the conversion to generic entry [2] and numerous follow up changes which simplified the entry code significantly, adding support for lazy asce handling doesn't add much complexity to the entry code anymore. In particular this means: - On kernel entry the primary asce is not modified and contains the user asce - Kernel accesses which require secondary-space mode (for example futex operations) are surrounded by enable_sacf_uaccess() and disable_sacf_uaccess() calls. enable_sacf_uaccess() sets the primary asce to kernel asce so that the sacf instruction can be used to switch to secondary-space mode. The primary asce is changed back to user asce with disable_sacf_uaccess(). The state of the control register which contains the primary asce is reflected with a new TIF_ASCE_PRIMARY bit. This is required on context switch so that the correct asce is restored for the scheduled in process. In result address spaces are now setup like this: CPU running in | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE -----------------------------|-----------|-----------|----------- user space | user | user | kernel kernel (no sacf) | user | user | kernel kernel (during sacf uaccess) | kernel | user | kernel kernel (kvm guest execution) | guest | user | kernel In result cr1 control register content is not changed except for: - futex system calls - legacy s390 PCI system calls - the kvm specific cmpxchg_user_key() uaccess helper This leads to faster system call execution. [1] 87d598634521 ("s390/mm: remove set_fs / rework address space handling") [2] 56e62a737028 ("s390: convert to generic entry") Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-03-31s390/asm-offsets: Remove ASM_OFFSETS_CHeiko Carstens
Remove ASM_OFFSETS_C which is used as guard in thread_info.h to decide if asm-offsets can be included or not. There is no reason to include asm-offsets.h in thread_info.h anymore. Remove the define and the not needed include. Explicitly include asm-offsets.h in all header files which require it, and where it used to be included implicitly via thread_info.h. This reduces header dependencies. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-31s390/asm-offsets: Include ftrace_regs.h instead of ftrace.hHeiko Carstens
Reduce header dependencies by including ftrace_regs.h and ptrace.h, which does not include other header files, instead of ftrace.h which pulls in various other header files. This is sufficient for __FTRACE_REGS_PT_REGS and __FTRACE_REGS_SIZE. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-31s390/kvm: Split kvm_host header fileHeiko Carstens
In order to generate asm offsets into kvm_s390_sie_block linux/kvm_host.h is included in asm-offsets.c. This causes quite often header dependency problems, since linux/kvm_host.h pulls in a lot of other header files. Solve this problem and split out the hardware structure declarations into a separate header file. Include only the new header file into asm-offsets.c instead of linux/kvm_host.h. This is sufficient to generate the two asm offsets required for kvm (__SIE_PROG0C and __SIE_PROG20). Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-04s390/boot: Pass pt_regs to program check handlerHeiko Carstens
Setup a pt_regs structure on the stack, poplulate it in low level assembler code, and pass it to print_pgm_check_info(). This way there is no need to access then lowcore from print_pgm_check_info() anymore, and the function looks like a normal program check handler function. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-04s390/asm-offsets: Rename __LC_PGM_INT_CODEHeiko Carstens
Avoid confusion and rename __LC_PGM_INT_CODE since it correlates to the pgm_code member of struct lowcore, and not the pgm_int_code member. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-04s390/time: Convert MACHINE_HAS_SCC to machine_has_scc()Heiko Carstens
Use static branch(es) to implement and use machine_has_scc() instead of a runtime check via MACHINE_HAS_SCC. This comes with a cleanup of early time initialization: the initial tod_clock_base value is now passed via the bootdata mechanism, instead of using absolute lowcore as transport vehicle from the decompressor to the kernel. Also the early tod clock initialization is moved to the decompressor which allows to use a static branch with machine_has_scc() within the kernel. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-12-26fgraph: Replace fgraph_ret_regs with ftrace_regsMasami Hiramatsu (Google)
Use ftrace_regs instead of fgraph_ret_regs for tracing return value on function_graph tracer because of simplifying the callback interface. The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by CONFIG_HAVE_FUNCTION_GRAPH_FREGS. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/173518991508.391279.16635322774382197642.stgit@devnote2 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-20Merge tag 'ftrace-v6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: - Restructure the function graph shadow stack to prepare it for use with kretprobes With the goal of merging the shadow stack logic of function graph and kretprobes, some more restructuring of the function shadow stack is required. Move out function graph specific fields from the fgraph infrastructure and store it on the new stack variables that can pass data from the entry callback to the exit callback. Hopefully, with this change, the merge of kretprobes to use fgraph shadow stacks will be ready by the next merge window. - Make shadow stack 4k instead of using PAGE_SIZE. Some architectures have very large PAGE_SIZE values which make its use for shadow stacks waste a lot of memory. - Give shadow stacks its own kmem cache. When function graph is started, every task on the system gets a shadow stack. In the future, shadow stacks may not be 4K in size. Have it have its own kmem cache so that whatever size it becomes will still be efficient in allocations. - Initialize profiler graph ops as it will be needed for new updates to fgraph - Convert to use guard(mutex) for several ftrace and fgraph functions - Add more comments and documentation - Show function return address in function graph tracer Add an option to show the caller of a function at each entry of the function graph tracer, similar to what the function tracer does. - Abstract out ftrace_regs from being used directly like pt_regs ftrace_regs was created to store a partial pt_regs. It holds only the registers and stack information to get to the function arguments and return values. On several archs, it is simply a wrapper around pt_regs. But some users would access ftrace_regs directly to get the pt_regs which will not work on all archs. Make ftrace_regs an abstract structure that requires all access to its fields be through accessor functions. - Show how long it takes to do function code modifications When code modification for function hooks happen, it always had the time recorded in how long it took to do the conversion. But this value was never exported. Recently the code was touched due to new ROX modification handling that caused a large slow down in doing the modifications and had a significant impact on boot times. Expose the timings in the dyn_ftrace_total_info file. This file was created a while ago to show information about memory usage and such to implement dynamic function tracing. It's also an appropriate file to store the timings of this modification as well. This will make it easier to see the impact of changes to code modification on boot up timings. - Other clean ups and small fixes * tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (22 commits) ftrace: Show timings of how long nop patching took ftrace: Use guard to take ftrace_lock in ftrace_graph_set_hash() ftrace: Use guard to take the ftrace_lock in release_probe() ftrace: Use guard to lock ftrace_lock in cache_mod() ftrace: Use guard for match_records() fgraph: Use guard(mutex)(&ftrace_lock) for unregister_ftrace_graph() fgraph: Give ret_stack its own kmem cache fgraph: Separate size of ret_stack from PAGE_SIZE ftrace: Rename ftrace_regs_return_value to ftrace_regs_get_return_value selftests/ftrace: Fix check of return value in fgraph-retval.tc test ftrace: Use arch_ftrace_regs() for ftrace_regs_*() macros ftrace: Consolidate ftrace_regs accessor functions for archs using pt_regs ftrace: Make ftrace_regs abstract from direct use fgragh: No need to invoke the function call_filter_check_discard() fgraph: Simplify return address printing in function graph tracer function_graph: Remove unnecessary initialization in ftrace_graph_ret_addr() function_graph: Support recording and printing the function return address ftrace: Have calltime be saved in the fgraph storage ftrace: Use a running sleeptime instead of saving on shadow stack fgraph: Use fgraph data to store subtime for profiler ...
2024-10-29s390: Remove gmap pointer from lowcoreClaudio Imbrenda
Remove the gmap pointer from lowcore, since it is not used anymore. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20241022120601.167009-9-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-29s390/entry: Remove __GMAP_ASCE and use _PIF_GUEST_FAULT againClaudio Imbrenda
Now that the guest ASCE is passed as a parameter to __sie64a(), _PIF_GUEST_FAULT can be used again to determine whether the fault was a guest or host fault. Since the guest ASCE will not be taken from the gmap pointer in lowcore anymore, __GMAP_ASCE can be removed. For the same reason the guest ASCE needs now to be saved into the cr1 save area unconditionally. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20241022120601.167009-2-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-10ftrace: Make ftrace_regs abstract from direct useSteven Rostedt
ftrace_regs was created to hold registers that store information to save function parameters, return value and stack. Since it is a subset of pt_regs, it should only be used by its accessor functions. But because pt_regs can easily be taken from ftrace_regs (on most archs), it is tempting to use it directly. But when running on other architectures, it may fail to build or worse, build but crash the kernel! Instead, make struct ftrace_regs an empty structure and have the architectures define __arch_ftrace_regs and all the accessor functions will typecast to it to get to the actual fields. This will help avoid usage of ftrace_regs directly. Link: https://lore.kernel.org/all/20241007171027.629bdafd@gandalf.local.home/ Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org> Cc: "x86@kernel.org" <x86@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Naveen N Rao <naveen@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/20241008230628.958778821@goodmis.org Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-29s390/entry: Unify save_area_sync and save_area_asyncSven Schnelle
In the past two save areas existed because interrupt handlers and system call / program check handlers where entered with interrupts enabled. To prevent a handler from overwriting the save areas from the previous handler, interrupts used the async save area, while system call and program check handler used the sync save area. Since the removal of critical section cleanup from entry.S, handlers are entered with interrupts disabled. When the interrupts are re-enabled, the save area is no longer need. Therefore merge both save areas into one. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-23s390/entry: Move SIE indicator flag to thread infoHeiko Carstens
CIF_SIE indicates if a thread is running in SIE context. This is the state of a thread and not the CPU. Therefore move this indicator to thread info. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-23s390: Move CIF flags to struct pcpuSven Schnelle
To allow testing flags for offline CPUs, move the CIF flags to struct pcpu. To avoid having to calculate the array index for each access, add a pointer to the pcpu member for the current cpu to lowcore. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-10s390/entry: Pass the asce as parameter to sie64a()Claudio Imbrenda
Pass the guest ASCE explicitly as parameter, instead of having sie64a() take it from lowcore. This removes hidden state from lowcore, and makes things look cleaner. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Link: https://lore.kernel.org/r/20240703155900.103783-2-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-05-14s390/idle: Rewrite psw_idle() in CSven Schnelle
To ease maintenance and further enhancements, convert the psw_idle() function to C. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-14s390/stackstrace: Detect vdso stack framesHeiko Carstens
Clear the backchain of the extra stack frame added by the vdso user wrapper code. This allows the user stack walker to detect and skip the non-standard stack frame. Without this an incorrect instruction pointer would be added to stack traces, and stack frame walking would be continued with a more or less random back chain. Fixes: aa44433ac4ee ("s390: add USER_STACKTRACE support") Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-14s390/vdso: Introduce and use struct stack_frame_vdso_wrapperHeiko Carstens
Introduce and use struct stack_frame_vdso_wrapper within vdso user wrapper code. With this structure it is possible to automatically generate an asm-offset define which can be used to save and restore the return address of the calling function. Also use STACK_FRAME_USER_OVERHEAD instead of STACK_FRAME_OVERHEAD to document that the code works with user space stack frames with the standard stack frame layout. Fixes: aa44433ac4ee ("s390: add USER_STACKTRACE support") Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-24s390/tracing: pass struct ftrace_regs to ftrace_trace_functionSven Schnelle
ftrace_trace_function expects a struct ftrace_regs, but the s390 architecure code passes struct pt_regs. This isn't a problem with the current code because struct ftrace_regs contains only one member: struct pt_regs. To avoid issues in the future this should be fixed. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-24s390/ftrace: enable HAVE_FUNCTION_GRAPH_RETVALSven Schnelle
Add support for tracing return values in the function graph tracer. This requires return_to_handler() to record gpr2 and the frame pointer Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-03s390/entry: remove mcck clockSven Schnelle
In the past machine checks where accounted as irq time. With the conversion to generic entry, it was decided to account machine checks to the current context. The stckf at the beginning of the machine check handler and the lowcore member is no longer required, therefore remove it. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-10-26s390/entry: sort out physical vs virtual pointers usage in sie64aNico Boehr
Fix virtual vs physical address confusion (which currently are the same). sie_block is accessed in entry.S and passed it to hardware, which is why both its physical and virtual address are needed. To avoid every caller having to do the virtual-physical conversion, add a new function sie64a() which converts the virtual address to physical. Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20221020143159.294605-3-nrb@linux.ibm.com Message-Id: <20221020143159.294605-3-nrb@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-06-01s390/stack: add union to reflect kvm stack slot usagesHeiko Carstens
Add a union which describes how the empty stack slots are being used by kvm and perf. This should help to avoid another bug like the one which was fixed with commit c9bfb460c3e4 ("s390/perf: obtain sie_block from the right address"). Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Tested-by: Nico Boehr <nrb@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-06-01s390/stack: merge empty stack frame slotsHeiko Carstens
Merge empty1 and empty2 arrays within the stack frame to one single array. This is possible since with commit 42b01a553a56 ("s390: always use the packed stack layout") the alternative stack frame layout is gone. Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-25s390: generate register offsets into pt_regs automaticallyHeiko Carstens
Use asm offsets method to generate register offsets into pt_regs, instead of open-coding at several places. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-03-10s390: raise minimum supported machine generation to z10Vasily Gorbik
Machine generations up to z9 (released in May 2006) have been officially out of service for several years now (z9 end of service - January 31, 2019). No distributions build kernels supporting those old machine generations anymore, except Debian, which seems to pick the oldest supported generation. The team supporting Debian on s390 has been notified about the change. Raising minimum supported machine generation to z10 helps to reduce maintenance cost and effectively remove code, which is not getting enough testing coverage due to lack of older hardware and distributions support. Besides that this unblocks some optimization opportunities and allows to use wider instruction set in asm files for future features implementation. Due to this change spectre mitigation and usercopy implementations could be drastically simplified and many newer instructions could be converted from ".insn" encoding to instruction names. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08s390/asm-offsets: remove unused definesHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-02-09s390/dump: fix old lowcore virtual vs physical address confusionAlexander Gordeev
Virtual addresses of vmcore_info and os_info members are wrongly passed to copy_oldmem_kernel(), while the function expects physical address of the source. Instead, __pa() macro should have been applied. Yet, use of __pa() macro could be somehow confusing, since copy_oldmem_kernel() may treat the source as an offset, not as a direct physical address (that depens from the oldmem availability and location). Fix the virtual vs physical address confusion and make the way the old lowcore is read consistent across all sources. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26s390: support command lines longer than 896 bytesSven Schnelle
Currently s390 supports a fixed maximum command line length of 896 bytes. This isn't enough as some installers are trying to pass all configuration data via kernel command line, and even with zfcp alone it is easy to generate really long command lines. Therefore extend the command line to 4 kbytes. In the parm area where the command line is stored there is no indication of the maximum allowed length, so a new field which contains the maximum length is added. The parm area has always been initialized to zero, so with old kernels this field would read zero. This is important because tools like zipl could read this field. If it contains a number larger than zero zipl knows the maximum length that can be stored in the parm area, otherwise it must assume that it is booting a legacy kernel and only 896 bytes are available. The removing of trailing whitespace in head.S is also removed because code to do this is already present in setup_boot_command_line(). Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26s390: add support for BEAR enhancement facilitySven Schnelle
The Breaking-Event-Address-Register (BEAR) stores the address of the last breaking event instruction. Breaking events are usually instructions that change the program flow - for example branches, and instructions that modify the address in the PSW like lpswe. This is useful for debugging wild branches, because one could easily figure out where the wild branch was originating from. What is problematic is that lpswe is considered a breaking event, and therefore overwrites BEAR on kernel exit. The BEAR enhancement facility adds new instructions that allow to save/restore BEAR and also an lpswey instruction that doesn't cause a breaking event. So we can save BEAR on kernel entry and restore it on exit to user space. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26s390: rename last_break to pgm_last_breakSven Schnelle
With the upcoming BEAR enhancements last_break isn't really unique, so rename it to pgm_last_break. This way it should be more obvious that this is the last_break value that is written by the hardware when a program check occurs. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-19s390: make STACK_FRAME_OVERHEAD available via asm-offsets.hHeiko Carstens
Make STACK_FRAME_OVERHEAD available via asm-offsets.h. This allows to add s390 specific asm code to e.g. ftrace samples, without requiring to add random header files, which might cause all sort of problems on other architectures. asm-offsets.h can be assumed to be non-problematic. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20211012133802.2460757-3-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-08-26s390/smp: enable DAT before CPU restart callback is calledAlexander Gordeev
The restart interrupt is triggered whenever a secondary CPU is brought online, a remote function call dispatched from another CPU or a manual PSW restart is initiated and causes the system to kdump. The handling routine is always called with DAT turned off. It then initializes the stack frame and invokes a callback. The existing callbacks handle DAT as follows: * __do_restart() and __machine_kexec() turn in on upon entry; * __ipl_run(), __reipl_run() and __dump_run() do not turn it right away, but all of them call diag308() - which turns DAT on, but only if kasan is enabled; In addition to the described complexity all callbacks (and the functions they call) should avoid kasan instrumentation while DAT is off. This update enables DAT in the assembler restart handler and relieves any callbacks (which are mostly C functions) from dealing with DAT altogether. There are four types of CPU restart that initialize control registers in different ways: 1. Start of secondary CPU on boot - control registers are inherited from the IPL CPU; 2. Restart of online CPU - control registers of the CPU being restarted are kept; 3. Hotplug of offline CPU - control registers are inherited from the starting CPU; 4. Start of offline CPU triggered by manual PSW restart - the control registers are read from the absolute lowcore and contain the boot time IPL CPU values updated with all follow-up calls of smp_ctl_set_bit() and smp_ctl_clear_bit() routines; In first three cases contents of the control registers is the most recent. In the latter case control registers are good enough to facilitate successful completion of kdump operation. Suggested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27s390/setup: generate asm offsets from struct parmareaAlexander Egorenkov
To reduce duplication, replace error-prone and hard-coded parameter area offsets with auto-generated ones. Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-05s390/mcck: move register validation to C codeAlexander Gordeev
This update partially reverts commit 3037a52f9846 ("s390/nmi: do register validation as early as possible"). Storage error checks and control registers validation are left in the assembler code, since correct ASCEs and page tables are required to enable DAT - which is done before the C handler is entered. System damage, kernel instruction address and PSW MWP checks are left in the assembler code as well, since there is no way to proceed if one of these checks is failed. The getcpu vdso syscall reads CPU number from the programmable field of the TOD clock. Disregard the TOD programmable register validity bit and load the CPU number into the TOD programmable field unconditionally. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07s390/ipl: make parameter area accessible via struct parmareaHeiko Carstens
Since commit 9a965ea95135 ("s390/kexec_file: Simplify parmarea access") we have struct parmarea which describes the layout of the kernel parameter area. Make the kernel parameter area available as global variable parmarea of type struct parmarea, which allows to easily access its members. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07s390/facilities: move stfl information from lowcore to global dataSven Schnelle
With gcc-11, there are a lot of warnings because the facility functions are accessing lowcore through a null pointer. Fix this by moving the facility arrays away from lowcore. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07s390/entry: use assignment to read intcode / asm to copy gprsSven Schnelle
arch/s390/kernel/syscall.c: In function __do_syscall: arch/s390/kernel/syscall.c:147:9: warning: memcpy reading 64 bytes from a region of size 0 [-Wstringop-overread] 147 | memcpy(&regs->gprs[8], S390_lowcore.save_area_sync, 8 * sizeof(unsigned long)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/s390/kernel/syscall.c:148:9: warning: memcpy reading 4 bytes from a region of size 0 [-Wstringop-overread] 148 | memcpy(&regs->int_code, &S390_lowcore.svc_ilc, sizeof(regs->int_code)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by moving the gprs restore from C to assembly, and use a assignment for int_code instead of memcpy. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13s390: add stack for machine check handlerSven Schnelle
The previous code used the normal kernel stack for machine checks. This is problematic when a machine check interrupts a system call or interrupt handler right at the beginning where registers are set up. Assume system_call is interrupted at the first instruction and a machine check is triggered. The machine check handler is called, checks the PSW to see whether it is coming from user space, notices that it is already in kernel mode but %r15 still contains the user space stack. This would lead to a kernel crash. There are basically two ways of fixing that: Either using the 'critical cleanup' approach which compares the address in the PSW to see whether it is already at a point where the stack has been set up, or use an extra stack for the machine check handler. For simplicity, we will go with the second approach and allocate an extra stack. This adds some memory overhead for large systems, but usually large system have plenty of memory so this isn't really a concern. But it keeps the mchk stack setup simple and less error prone. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Cc: <stable@kernel.org> # v5.8+ Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-19s390: convert to generic entrySven Schnelle
This patch converts s390 to use the generic entry infrastructure from kernel/entry/*. There are a few special things on s390: - PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't know about our PIF flags in exit_to_user_mode_loop(). - The old code had several ways to restart syscalls: a) PIF_SYSCALL_RESTART, which was only set during execve to force a restart after upgrading a process (usually qemu-kvm) to pgste page table extensions. b) PIF_SYSCALL, which is set by do_signal() to indicate that the current syscall should be restarted. This is changed so that do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use PIF_SYSCALL doesn't work with the generic code, and changing it to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART more unique. - On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by executing a svc instruction on the process stack which causes a fault. While handling that fault the fault code sets PIF_SYSCALL to hand over processing to the syscall code on exit to usermode. The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets a return value for a syscall. The s390x ptrace ABI uses r2 both for the syscall number and return value, so ptrace cannot set the syscall number + return value at the same time. The flag makes handling that a bit easier. do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET is set. CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY. CR1/7/13 will be checked both on kernel entry and exit to contain the correct asces. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-11-23s390/mm: remove set_fs / rework address space handlingHeiko Carstens
Remove set_fs support from s390. With doing this rework address space handling and simplify it. As a result address spaces are now setup like this: CPU running in | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE ----------------------------|-----------|-----------|----------- user space | user | user | kernel kernel, normal execution | kernel | user | kernel kernel, kvm guest execution | gmap | user | kernel To achieve this the getcpu vdso syscall is removed in order to avoid secondary address mode and a separate vdso address space in for user space. The getcpu vdso syscall will be implemented differently with a subsequent patch. The kernel accesses user space always via secondary address space. This happens in different ways: - with mvcos in home space mode and directly read/write to secondary address space - with mvcs/mvcp in primary space mode and copy from primary space to secondary space or vice versa - with e.g. cs in secondary space mode and access secondary space Switching translation modes happens with sacf before and after instructions which access user space, like before. Lazy handling of control register reloading is removed in the hope to make everything simpler, but at the cost of making kernel entry and exit a bit slower. That is: on kernel entry the primary asce is always changed to contain the kernel asce, and on kernel exit the primary asce is changed again so it contains the user asce. In kernel mode there is only one exception to the primary asce: when kvm guests are executed the primary asce contains the gmap asce (which describes the guest address space). The primary asce is reset to kernel asce whenever kvm guest execution is interrupted, so that this doesn't has to be taken into account for any user space accesses. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-23s390: fix fpu restore in entry.SSven Schnelle
We need to disable interrupts in load_fpu_regs(). Otherwise an interrupt might come in after the registers are loaded, but before CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns, CIF_FPU will be cleared and the registers will never be restored. The entry.S code usually saves the interrupt state in __SF_EMPTY on the stack when disabling/restoring interrupts. sie64a however saves the pointer to the sie control block in __SF_SIE_CONTROL, which references the same location. This is non-obvious to the reader. To avoid thrashing the sie control block pointer in load_fpu_regs(), move the __SIE_* offsets eight bytes after __SF_EMPTY on the stack. Cc: <stable@vger.kernel.org> # 5.8 Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Reported-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-03s390/vdso: remove unused constantsHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-08-26s390: convert to GENERIC_VDSOSven Schnelle
Convert s390 to generic vDSO. There are a few special things on s390: - vDSO can be called without a stack frame - glibc did this in the past. So we need to allocate a stackframe on our own. - The former assembly code used stcke to get the TOD clock and applied time steering to it. We need to do the same in the new code. This is done in the architecture specific __arch_get_hw_counter function. The steering information is stored in an architecure specific area in the vDSO data. - CPUCLOCK_VIRT is now handled with a syscall fallback, which might be slower/less accurate than the old implementation. The getcpu() function stays as an assembly function because there is no generic implementation and the code is just a few lines. Performance number from my system do 100 mio gettimeofday() calls: Plain syscall: 8.6s Generic VDSO: 1.3s old ASM VDSO: 1s So it's a bit slower but still much faster than syscalls. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-06-16s390/vdso: fix vDSO clock_getres()Vincenzo Frascino
clock_getres in the vDSO library has to preserve the same behaviour of posix_get_hrtimer_res(). In particular, posix_get_hrtimer_res() does: sec = 0; ns = hrtimer_resolution; and hrtimer_resolution depends on the enablement of the high resolution timers that can happen either at compile or at run time. Fix the s390 vdso implementation of clock_getres keeping a copy of hrtimer_resolution in vdso data and using that directly. Link: https://lkml.kernel.org/r/20200324121027.21665-1-vincenzo.frascino@arm.com Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [heiko.carstens@de.ibm.com: use llgf for proper zero extension] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-06-09mm: reorder includes after introduction of linux/pgtable.hMike Rapoport
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include of the latter in the middle of asm includes. Fix this up with the aid of the below script and manual adjustments here and there. import sys import re if len(sys.argv) is not 3: print "USAGE: %s <file> <header>" % (sys.argv[0]) sys.exit(1) hdr_to_move="#include <linux/%s>" % sys.argv[2] moved = False in_hdrs = False with open(sys.argv[1], "r") as f: lines = f.readlines() for _line in lines: line = _line.rstrip(' ') if line == hdr_to_move: continue if line.startswith("#include <linux/"): in_hdrs = True elif not moved and in_hdrs: moved = True print hdr_to_move print line Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mm: introduce include/linux/pgtable.hMike Rapoport
The include/linux/pgtable.h is going to be the home of generic page table manipulation functions. Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and make the latter include asm/pgtable.h. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-10s390: prevent leaking kernel address in BEARSven Schnelle
When userspace executes a syscall or gets interrupted, BEAR contains a kernel address when returning to userspace. This make it pretty easy to figure out where the kernel is mapped even with KASLR enabled. To fix this, add lpswe to lowcore and always execute it there, so userspace sees only the lowcore address of lpswe. For this we have to extend both critical_cleanup and the SWITCH_ASYNC macro to also check for lpswe addresses in lowcore. Fixes: b2d24b97b2a9 ("s390/kernel: add support for kernel address space layout randomization (KASLR)") Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390/vdso: fix getcpuHeiko Carstens
getcpu reads the required values for cpu and node with two instructions. This might lead to an inconsistent result if user space gets preempted and migrated to a different CPU between the two instructions. Fix this by using just a single instruction to read both values at once. This is currently rather a theoretical bug, since there is no real NUMA support available (except for NUMA emulation). Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>