summaryrefslogtreecommitdiff
path: root/arch/um/kernel
AgeCommit message (Collapse)Author
2024-10-26um: fix stub exe build with CONFIG_GCOVJohannes Berg
CONFIG_GCOV is special and only in UML since it builds the kernel with a "userspace" option. This is fine, but the stub is even more special and not really a full userspace process, so it then fails to link as reported. Remove the GCOV options from the stub build. For good measure, also remove the GPROF options, even though they don't seem to cause build failures now. To be able to do this, export the specific options (GCOV_OPT and GPROF_OPT) but rename them so there's less chance of any conflicts. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410242238.SXhs2kQ4-lkp@intel.com/ Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs") Link: https://patch.msgid.link/20241025102700.9fbb9c34473f.I7f1537fe075638f8da64beb52ef6c9e5adc51bc3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: switch to regset API and depend on XSTATEBenjamin Berg
The PTRACE_GETREGSET API has now existed since Linux 2.6.33. The XSAVE CPU feature should also be sufficiently common to be able to rely on it. With this, define our internal FP state to be the hosts XSAVE data. Add discovery for the hosts XSAVE size and place the FP registers at the end of task_struct so that we can adjust the size at runtime. Next we can implement the regset API on top and update the signal handling as well as ptrace APIs to use them. Also switch coredump creation to use the regset API and finally set HAVE_ARCH_TRACEHOOK. This considerably improves the signal frames. Previously they might not have contained all the registers (i386) and also did not have the sizes and magic values set to the correct values to permit userspace to decode the frame. As a side effect, this will permit UML to run on hosts with newer CPU extensions (such as AMX) that need even more register state. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241023094120.4083426-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: insert scheduler ticks when userspace does not yieldBenjamin Berg
In time-travel mode userspace can do a lot of work without any time passing. Unfortunately, this can result in OOM situations as the RCU core code will never be run. Work around this by keeping track of userspace processes that do not yield for a lot of operations. When this happens, insert a jiffie into the sched_clock clock to account time against the process and cause the bookkeeping to run. As sched_clock is used for tracing, it is useful to keep it in sync between the different VMs. As such, try to remove added ticks again when the actual clock ticks. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241010142537.1134685-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Rename _PAGE_NEWPAGE to _PAGE_NEEDSYNCTiwei Bie
The _PAGE_NEWPAGE bit does not really indicate that this is a new page, but rather whether this entry needs to be synced or not. Renaming it to _PAGE_NEEDSYNC will make it more clear how everything ties together. Suggested-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011102354.1682626-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Abandon the _PAGE_NEWPROT bitTiwei Bie
When a PTE is updated in the page table, the _PAGE_NEWPAGE bit will always be set. And the corresponding page will always be mapped or unmapped depending on whether the PTE is present or not. The check on the _PAGE_NEWPROT bit is not really reachable. Abandoning it will allow us to simplify the code and remove the unreachable code. Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011102354.1682626-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Do not propagate initrd parameter to kernelTiwei Bie
This parameter is UML specific. It specifies the name of the file containing the initrd image, which is unknown to kernel. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011040441.1586345-10-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Do not propagate dtb parameter to kernelTiwei Bie
This parameter is UML specific and is unknown to kernel. It should not be propagated to kernel, otherwise it will be passed to user space as an environment option by kernel with a warning like: Unknown kernel command line parameters "dtb=/foo", will be passed to user space. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011040441.1586345-5-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Do not propagate mem parameter to kernelTiwei Bie
This parameter is UML specific and is unknown to kernel. It should not be propagated to kernel, otherwise it will be passed to user space as an environment option by kernel with a warning like: Unknown kernel command line parameters "mem=2G", will be passed to user space. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011040441.1586345-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Remove UML specific debug parameterTiwei Bie
It does nothing but emit a warning when 'debug' is provided in the kernel command line. It can be a bit annoying, as 'debug' is also a valid kernel parameter to enable kernel debugging. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011040441.1586345-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: remove fault_catcher infrastructureJohannes Berg
This was perhaps intended to do _nofault copies, but the real reason is lost to history. Remove this, it's not needed, and using longjmp() out of the middle of the signal handler with all the state it has modified is not going to be a good idea anyway. Link: https://patch.msgid.link/20241010224513.901c4d390b3e.Ia74742668b44603c1ca23dd36f90e964e6e7ee55@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: make stub_exe _start() pure inline asmJohannes Berg
Since __attribute__((naked)) cannot be used with functions containing C statements, just generate the few instructions it needs in assembly directly. While at it, fix the stack usage ("1 + 2*x - 1" is odd) and document what it must do, and why it must adjust the stack. Fixes: 8508a5e0e9db ("um: Fix misaligned stack in stub_exe") Link: https://lore.kernel.org/linux-um/CABVgOSntH-uoOFMP5HwMXjx_f1osMnVdhgKRKm4uz6DFm2Lb8Q@mail.gmail.com/ Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-18um: Fix misaligned stack in stub_exeDavid Gow
The stub_exe could segfault when built with some compilers (e.g. gcc 13.2.0), as SSE instructions which relied on stack alignment could be generated, but the stack was misaligned. This seems to be due to the __start entry point being run with a 16-byte aligned stack, but the x86_64 SYSV ABI wanting the stack to be so aligned _before_ a function call (so it is misaligned when the function is entered due to the return address being pushed). The function prologue then realigns it. Because the entry point is never _called_, and hence there is no return address, the prologue is therefore actually misaligning it, and causing the generated movaps instructions to SIGSEGV. This results in the following error: start_userspace : expected SIGSTOP, got status = 139 Don't generate this prologue for __start by using __attribute__((naked)), which resolves the issue. Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs") Signed-off-by: David Gow <davidgow@google.com> Link: https://lore.kernel.org/linux-um/CABVgOS=boUoG6=LHFFhxEd8H8jDP1zOaPKFEjH+iy2n2Q5S2aQ@mail.gmail.com/ Link: https://patch.msgid.link/20241017231007.1500497-2-davidgow@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-17um: Disable auto variable initialization for stub_exe.cNathan Chancellor
When automatic variable initialization is enabled via CONFIG_INIT_STACK_ALL_{PATTERN,ZERO}, clang will insert a call to memset() to initialize an object created with __builtin_alloca(). This ultimately breaks the build when linking stub_exe because it is a standalone executable that does not include or link against memset(). ld: arch/um/kernel/skas/stub_exe.o: in function `_start': arch/um/kernel/skas/stub_exe.c:83:(.ltext+0x15): undefined reference to `memset' Disable automatic variable initialization for stub_exe.c by passing the default value of 'uninitialized' to '-ftrivial-auto-var-init', which avoids generating the call to memset(). This code is small and runs quickly as it is just designed to set up an environment, so stack variable initialization is unnecessary overhead for little gain. Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20241016-uml-fix-stub_exe-clang-v1-2-3d6381dc5a78@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-17um: Fix passing '-n' to linker for stub_exeNathan Chancellor
When building stub_exe with clang, there is an error because '-n' is not a recognized flag by the clang driver (which is being used to invoke the linker): clang: error: unknown argument: '-n' '-n' should be passed along to the linker, as it is the short flag for '--nmagic', so prefix it with '-Wl,'. Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20241016-uml-fix-stub_exe-clang-v1-1-3d6381dc5a78@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Switch to 4 level page tables on 64 bitBenjamin Berg
The larger memory space is useful to support more applications inside UML. One example for this is ASAN instrumentation of userspace applications which requires addresses that would otherwise not be available. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-11-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: clear all memory in new userspace processesBenjamin Berg
With the change to use execve() we can now safely clear the memory up to STUB_START as rseq will not be trying to use memory in that region. Also, on 64 bit the previous changes should mean that there is no usable memory range above the stub. Make the change and remove the comment as it is not needed anymore. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-10-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Discover host_task_size from envpBenjamin Berg
When loading the UML binary, the host kernel will place the stack at the highest possible address. It will then map the program name and environment variables onto the start of the stack. As such, an easy way to figure out the host_task_size is to use the highest pointer to an environment variable as a reference. Ensure that this works by disabling address layout randomization and re-executing UML in case it was enabled. This increases the available TASK_SIZE for 64 bit UML considerably. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-9-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Limit TASK_SIZE to the addressable rangeBenjamin Berg
We may have a TASK_SIZE from the host that is bigger than UML is able to address with a three-level pagetable on 64-bit. Guard against that by clipping the maximum TASK_SIZE to the maximum addressable area. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-8-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Calculate stub data address relative to stub codeBenjamin Berg
Instead of using the current stack pointer, we can also use the current instruction to calculate where the stub data is. With this the stub data only needs to be aligned to a full page boundary. Changing this has the advantage that we do not have a hole in the memory space above the stub data (which would need to be explicitly cleared). Another motivation to do this is that with the planned addition of a SECCOMP based userspace the stack pointer may not be fully trustworthy. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-7-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Add compile time assert that stub fits on a pageBenjamin Berg
The code assumes that the stub code can fit into a single page. This is unlikely to ever change, but add a link time assert instead so that there will be no hard to debug error. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-6-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Set parent death signal for userspace processBenjamin Berg
Enable PR_SET_PDEATHSIG so that the UML userspace process will be killed when the kernel exits unexpectedly. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-4-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: use execveat to create userspace MMsBenjamin Berg
Using clone will not undo features that have been enabled by libc. An example of this already happening is rseq, which could cause the kernel to read/write memory of the userspace process. In the future the standard library might also use mseal by default to protect itself, which would also thwart our attempts at unmapping everything. Solve all this by taking a step back and doing an execve into a tiny static binary that sets up the minimal environment required for the stub without using any standard library. That way we have a clean execution environment that is fully under the control of UML. Note that this changes things a bit as the FDs are not anymore shared with the kernel. Instead, we explicitly share the FDs for the physical memory and all existing iomem regions. Doing this is fine, as iomem regions cannot be added at runtime. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-3-benjamin@sipsolutions.net [use pipe() instead of pipe2(), remove unneeded close() calls] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: remove auxiliary FP registersBenjamin Berg
We do not need the extra save/restore of the FP registers when getting the fault information. This was originally added in commit 2f56debd77a8 ("uml: fix FP register corruption") but at that time the code was not saving/restoring the FP registers when switching to userspace. This was fixed in commit fbfe9c847edf ("um: Save FPU registers between task switches") and since then the auxiliary registers have not been useful. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241004233821.2130874-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: always use the internal copy of the FP registersBenjamin Berg
When switching from userspace to the kernel, all registers including the FP registers are copied into the kernel and restored later on. As such, the true source for the FP register state is actually already in the kernel and they should never be grabbed from the userspace process. Change the various places to simply copy the data from the internal FP register storage area. Note that on i386 the format of PTRACE_GETFPREGS and PTRACE_GETFPXREGS is different enough that conversion would be needed. With this patch, -EINVAL is returned if the non-native format is requested. The upside is, that this patchset fixes setting registers via ptrace (which simply did not work before) as well as fixing setting floating point registers using the mcontext on signal return on i386. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240913133845.964292-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Fix the return value of elf_core_copy_task_fpregsTiwei Bie
This function is expected to return a boolean value, which should be true on success and false on failure. Fixes: d1254b12c93e ("uml: fix x86_64 core dump crash") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240913023302.130300-1-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Fix the definition for physmem_sizeTiwei Bie
Currently physmem_size is defined as long long but declared locally as unsigned long long before using it in separate .c files. Make them match by defining physmem_size as unsigned long long and also move the declaration to a common header to allow the compiler to check it. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240916045950.508910-5-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Remove highmem leftoversTiwei Bie
Highmem was only supported on UML/i386. And the support has been removed by commit a98a6d864d3b ("um: Remove broken highmem support"). Remove the leftovers and stop UML from trying to setup highmem when the sum of physmem_size and iomem_size exceeds max_physmem. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240916045950.508910-4-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Fix potential integer overflow during physmem setupTiwei Bie
This issue happens when the real map size is greater than LONG_MAX, which can be easily triggered on UML/i386. Fixes: fe205bdd1321 ("um: Print minimum physical memory requirement") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240916045950.508910-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-09-27Merge tag 'uml-for-linus-6.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Removal of dead code (TT mode leftovers, etc) - Fixes for the network vector driver - Fixes for time-travel mode * tag 'uml-for-linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: fix time-travel syscall scheduling hack um: Remove outdated asm/sysrq.h header um: Remove the declaration of user_thread function um: Remove the call to SUBARCH_EXECVE1 macro um: Remove unused mm_fd field from mm_id um: Remove unused fields from thread_struct um: Remove the redundant newpage check in update_pte_range um: Remove unused kpte_clear_flush macro um: Remove obsoleted declaration for execute_syscall_skas user_mode_linux_howto_v2: add VDE vector support in doc vector_user: add VDE support um: remove ARCH_NO_PREEMPT_DYNAMIC um: vector: Fix NAPI budget handling um: vector: Replace locks guarding queue depth with atomics um: remove variable stack array in os_rcv_fd_msg()
2024-09-12um: fix time-travel syscall scheduling hackJohannes Berg
The schedule() call there really never did anything at least since the introduction of the EEVDF scheduler, but now I found a case where we permanently hang in a loop of -ERESTARTNOINTR (due to locking.) Work around it by making any syscalls with error return take time (and then schedule after) so we cannot hang in such a loop forever. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12um: Remove outdated asm/sysrq.h headerTiwei Bie
This header no longer serves a purpose after show_trace was removed by commit 9d1ee8ce92e1 ("um: Rewrite show_stack()"). Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12um: Remove the call to SUBARCH_EXECVE1 macroTiwei Bie
This macro has never been defined by any supported sub-architectures in tree since it was introduced by commit 1d3468a6643a ("[PATCH uml: move _kern.c files"). Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12um: Remove unused mm_fd field from mm_idTiwei Bie
It's no longer used since the removal of the SKAS3/4 support. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12um: Remove unused fields from thread_structTiwei Bie
These fields are no longer used since the removal of tt mode. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12um: Remove the redundant newpage check in update_pte_rangeTiwei Bie
The two checks have been identical since commit ef714f15027c ("um: remove force_flush_all from fork_handler"). And the inner one isn't necessary anymore. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-08-08Merge tag 'drm-misc-next-2024-08-01' of ↵Daniel Vetter
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: virtio: - Define DRM capset Cross-subsystem Changes: dma-buf: - heaps: Clean up documentation printk: - Pass description to kmsg_dump() Core Changes: CI: - Update IGT tests - Point upstream repo to GitLab instance modesetting: - Introduce Power Saving Policy property for connectors - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support panic: - Avoid build-time interference with framebuffer console docs: - Document Colorspace property scheduler: - Remove full_recover from drm_sched_start TTM: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory Driver Changes: amdgpu: - Support Power Saving Policy connector property ast: - astdp: Support AST2600 with VGA; Clean up HPD bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable gma500: - Update i2c terminology ivpu: - Add MODULE_FIRMWARE() lcdif: - Fix pixel clock loongson: - Use GEM refcount over TTM's mgag200: - Improve BMC handling - Support VBLANK intterupts nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's panel: - Shutdown fixes plus documentation - Refactor several drivers for better code sharing - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus DT; Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor for code sharing sti: - Fix module owner stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: Fix transparency after disabling plane; Remove unused interrupt tegra: - Call drm_atomic_helper_shutdown() v3d: - Clean up perfmon vkms: - Clean up Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240801121406.GA102996@linux.fritz.box
2024-07-17printk: Add a short description string to kmsg_dump()Jocelyn Falempe
kmsg_dump doesn't forward the panic reason string to the kmsg_dumper callback. This patch adds a new struct kmsg_dump_detail, that will hold the reason and description, and pass it to the dump() callback. To avoid updating all kmsg_dump() call, it adds a kmsg_dump_desc() function and a macro for backward compatibility. I've written this for drm_panic, but it can be useful for other kmsg_dumper. It allows to see the panic reason, like "sysrq triggered crash" or "VFS: Unable to mount root fs on xxxx" on the drm panic screen. v2: * Use a struct kmsg_dump_detail to hold the reason and description pointer, for more flexibility if we want to add other parameters. (Kees Cook) * Fix powerpc/nvram_64 build, as I didn't update the forward declaration of oops_to_nvram() Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Acked-by: Petr Mladek <pmladek@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Kees Cook <kees@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240702122639.248110-1-jfalempe@redhat.com
2024-07-04um: register power-off handlerJohannes Berg
Otherwise we always get reboot: Power off not available: System halted instead which is really quite pointless. Link: https://patch.msgid.link/20240703173839.fcbb538c6686.I3d333f4773cff93c4337c4d128ee0b1b501b3dfa@changeid Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Enable preemption in UMLAnton Ivanov
Since userspace state is saved in the MM process, kernel using FPU still doesn't really need to do anything, so this really is as simple as enabling preemption. The irq critical section in sigio_handler() needs preempt_disable()/preempt_enable(). Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://patch.msgid.link/20240702102549.d2fcea450854.I12f5a53d80ec1e425e66ef272b1e95cb523b608e@changeid [rebase, remove FPU save/restore, fix x86/um Makefile, rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: refactor TLB update handlingBenjamin Berg
Conceptually, we want the memory mappings to always be up to date and represent whatever is in the TLB. To ensure that, we need to sync them over in the userspace case and for the kernel we need to process the mappings. The kernel will call flush_tlb_* if page table entries that were valid before become invalid. Unfortunately, this is not the case if entries are added. As such, change both flush_tlb_* and set_ptes to track the memory range that has to be synchronized. For the kernel, we need to execute a flush_tlb_kern_* immediately but we can wait for the first page fault in case of set_ptes. For userspace in contrast we only store that a range of memory needs to be synced and do so whenever we switch to that process. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-13-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: simplify and consolidate TLB updatesBenjamin Berg
The HVC update was mostly used to compress consecutive calls into one. This is mostly relevant for userspace where it is already handled by the syscall stub code. Simplify the whole logic and consolidate it for both kernel and userspace. This does remove the sequential syscall compression for the kernel, however that shouldn't be the main factor in most runs. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-12-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: remove force_flush_all from fork_handlerBenjamin Berg
There should be no need for this. It may be that this used to work around another issue where after a clone the MM was in a bad state. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-11-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Do not flush MM in flush_threadBenjamin Berg
There should be no need to flush the memory in flush_thread. Doing this likely worked around some issue where memory was still incorrectly mapped when creating or cloning an MM. With the removal of the special clone path, that isn't relevant anymore. However, add the flush into MM initialization so that any new userspace MM is guaranteed to be clean. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-10-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Delay flushing syscalls until the thread is restartedBenjamin Berg
As running the syscalls is expensive due to context switches, we should do so as late as possible in case more syscalls need to be queued later on. This will also benefit a later move to a SECCOMP enabled userspace as in that case the need for extra context switches is removed entirely. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20240703134536.1161108-9-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: remove copy_context_skas0Benjamin Berg
The kernel flushes the memory ranges anyway for CoW and does not assume that the userspace process has anything set up already. So, start with a fresh process for the new mm context. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-8-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: remove LDT supportBenjamin Berg
The current LDT code has a few issues that mean it should be redone in a different way once we always start with a fresh MM even when cloning. In a new and better world, the kernel would just ensure its own LDT is clear at startup. At that point, all that is needed is a simple function to populate the LDT from another MM in arch_dup_mmap combined with some tracking of the installed LDT entries for each MM. Note that the old implementation was even incorrect with regard to reading, as it copied out the LDT entries in the internal format rather than converting them to the userspace structure. Removal should be fine as the LDT is not used for thread-local storage anymore. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-7-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Rework syscall handlingBenjamin Berg
Rework syscall handling to be platform independent. Also create a clean split between queueing of syscalls and flushing them out, removing the need to keep state in the code that triggers the syscalls. The code adds syscall_data_len to the global mm_id structure. This will be used later to allow surrounding code to track whether syscalls still need to run and if errors occurred. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20240703134536.1161108-5-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Create signal stack memory assignment in stub_dataBenjamin Berg
When we switch to use seccomp, we need both the signal stack and other data (i.e. syscall information) to co-exist in the stub data. To facilitate this, start by defining separate memory areas for the stack and syscall data. This moves the signal stack onto a new page as the memory area is not sufficient to hold both signal stack and syscall information. Only change the signal stack setup for now, as the syscall code will be reworked later. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20240703134536.1161108-3-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: time-travel: remove time_exit()Johannes Berg
This function is unused and unneeded, remove it. Link: https://patch.msgid.link/20240703130105.02b3a974acb7.I7264821f7cfa17ea713b7a3e4787aa41a3107d01@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: add shared memory optimisation for time-travel=extJohannes Berg
With external time travel, a LOT of message can end up being exchanged on the socket, taking a significant amount of time just to do that. Add a new shared memory optimisation to that, where a number of changes are made: - the controller sends a client ID and a shared memory FD (and a logging FD we don't use) in the ACK message to the initial START - the shared memory holds the current time and the free_until value, so that there's no need to exchange messages for that - if the client that's running has shared memory support, any client (the running one included) can request the next time it wants to run inside the shared memory, rather than sending a message, by also updating the free_until value - when shared memory is enabled, RUN/WAIT messages no longer have an ACK, further cutting down on messages Together, this can reduce the number of messages very significantly, and reduce overall test/simulation run time. Co-developed-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Link: https://patch.msgid.link/20240702192118.6ad0a083f574.Ie41206c8ce4507fe26b991937f47e86c24ca7a31@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>