summaryrefslogtreecommitdiff
path: root/arch/s390
AgeCommit message (Collapse)Author
9 dayss390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL againIlya Leoshkevich
commit 6a5abf8cf182f577c7ae6c62f14debc9754ec986 upstream. Commit 7ded842b356d ("s390/bpf: Fix bpf_plt pointer arithmetic") has accidentally removed the critical piece of commit c730fce7c70c ("s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL"), causing intermittent kernel panics in e.g. perf's on_switch() prog to reappear. Restore the fix and add a comment. Fixes: 7ded842b356d ("s390/bpf: Fix bpf_plt pointer arithmetic") Cc: stable@vger.kernel.org Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250716194524.48109-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17crypto: s390/sha - Fix uninitialized variable in SHA-1 and SHA-2Eric Biggers
commit 68279380266a5fa70e664de754503338e2ec3f43 upstream. Commit 88c02b3f79a6 ("s390/sha3: Support sha3 performance enhancements") added the field s390_sha_ctx::first_message_part and made it be used by s390_sha_update() (now s390_sha_update_blocks()). At the time, s390_sha_update() was used by all the s390 SHA-1, SHA-2, and SHA-3 algorithms. However, only the initialization functions for SHA-3 were updated, leaving SHA-1 and SHA-2 using first_message_part uninitialized. This could cause e.g. the function code CPACF_KIMD_SHA_512 | CPACF_KIMD_NIP to be used instead of just CPACF_KIMD_SHA_512. This apparently was harmless, as the SHA-1 and SHA-2 function codes ignore CPACF_KIMD_NIP; it is recognized only by the SHA-3 function codes (https://lore.kernel.org/r/73477fe9-a1dc-4e38-98a6-eba9921e8afa@linux.ibm.com/). Therefore, this bug was found only when first_message_part was later converted to a boolean and UBSAN detected its uninitialized use. Regardless, let's fix this by just initializing to zero. Note: in 6.16, we need to patch SHA-1, SHA-384, and SHA-512. In 6.15 and earlier, we'll also need to patch SHA-224 and SHA-256, as they hadn't yet been librarified (which incidentally fixed this bug). Fixes: 88c02b3f79a6 ("s390/sha3: Support sha3 performance enhancements") Cc: stable@vger.kernel.org Reported-by: Ingo Franzki <ifranzki@linux.ibm.com> Closes: https://lore.kernel.org/r/12740696-595c-4604-873e-aefe8b405fbf@linux.ibm.com Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20250703172316.7914-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-10s390/pci: Do not try re-enabling load/store if device is disabledNiklas Schnelle
commit b97a7972b1f4f81417840b9a2ab0c19722b577d5 upstream. If a device is disabled unblocking load/store on its own is not useful as a full re-enable of the function is necessary anyway. Note that SCLP Write Event Data Action Qualifier 0 (Reset) leaves the device disabled and triggers this case unless the driver already requests a reset. Cc: stable@vger.kernel.org Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery") Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-10s390/pci: Fix stale function handles in error handlingNiklas Schnelle
commit 45537926dd2aaa9190ac0fac5a0fbeefcadfea95 upstream. The error event information for PCI error events contains a function handle for the respective function. This handle is generally captured at the time the error event was recorded. Due to delays in processing or cascading issues, it may happen that during firmware recovery multiple events are generated. When processing these events in order Linux may already have recovered an affected function making the event information stale. Fix this by doing an unconditional CLP List PCI function retrieving the current function handle with the zdev->state_lock held and ignoring the event if its function handle is stale. Cc: stable@vger.kernel.org Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery") Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27s390/pci: Fix __pcilg_mio_inuser() inline assemblyHeiko Carstens
commit c4abe6234246c75cdc43326415d9cff88b7cf06c upstream. Use "a" constraint for the shift operand of the __pcilg_mio_inuser() inline assembly. The used "d" constraint allows the compiler to use any general purpose register for the shift operand, including register zero. If register zero is used this my result in incorrect code generation: 8f6: a7 0a ff f8 ahi %r0,-8 8fa: eb 32 00 00 00 0c srlg %r3,%r2,0 <---- If register zero is selected to contain the shift value, the srlg instruction ignores the contents of the register and always shifts zero bits. Therefore use the "a" constraint which does not permit to select register zero. Fixes: f058599e22d5 ("s390/pci: Fix s390_mmio_read/write with MIO") Cc: stable@vger.kernel.org Reported-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27KVM: s390: rename PROT_NONE to PROT_TYPE_DUMMYLorenzo Stoakes
commit 15ac613f124e51a6623975efad9657b1f3ee47e7 upstream. The enum type prot_type declared in arch/s390/kvm/gaccess.c declares an unfortunate identifier within it - PROT_NONE. This clashes with the protection bit define from the uapi for mmap() declared in include/uapi/asm-generic/mman-common.h, which is indeed what those casually reading this code would assume this to refer to. This means that any changes which subsequently alter headers in any way which results in the uapi header being imported here will cause build errors. Resolve the issue by renaming PROT_NONE to PROT_TYPE_DUMMY. Link: https://lkml.kernel.org/r/20250519145657.178365-1-lorenzo.stoakes@oracle.com Fixes: b3cefd6bf16e ("KVM: s390: Pass initialized arg even if unused") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Suggested-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505140943.IgHDa9s7-lkp@intel.com/ Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com> Acked-by: Yang Shi <yang@os.amperecomputing.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: <stable@vger.kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: James Houghton <jthoughton@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27s390/pci: Serialize device addition and removalNiklas Schnelle
commit 774a1fa880bc949d88b5ddec9494a13be733dfa8 upstream. Prior changes ensured that when zpci_release_device() is called and it removed the zdev from the zpci_list this instance can not be found via the zpci_list anymore even while allowing re-add of reserved devices. This only accounts for the overall lifetime and zpci_list addition and removal, it does not yet prevent concurrent add of a new instance for the same underlying device. Such concurrent add would subsequently cause issues such as attempted re-use of the same IOMMU sysfs directory and is generally undesired. Introduce a new zpci_add_remove_lock mutex to serialize adding a new device with removal. Together this ensures that if a struct zpci_dev is not found in the zpci_list it was either already removed and torn down, or its removal and tear down is in progress with the zpci_add_remove_lock held. Cc: stable@vger.kernel.org Fixes: a46044a92add ("s390/pci: fix zpci_zdev_put() on reserve") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Tested-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27s390/pci: Allow re-add of a reserved but not yet removed deviceNiklas Schnelle
commit 4b1815a52d7eb03b3e0e6742c6728bc16a4b2d1d upstream. The architecture assumes that PCI functions can be removed synchronously as PCI events are processed. This however clashes with the reference counting of struct pci_dev which allows device drivers to hold on to a struct pci_dev reference even as the underlying device is removed. To bridge this gap commit 2a671f77ee49 ("s390/pci: fix use after free of zpci_dev") keeps the struct zpci_dev in ZPCI_FN_STATE_RESERVED state until common code releases the struct pci_dev. Only when all references are dropped, the struct zpci_dev can be removed and freed. Later commit a46044a92add ("s390/pci: fix zpci_zdev_put() on reserve") moved the deletion of the struct zpci_dev from the zpci_list in zpci_release_device() to the point where the device is reserved. This was done to prevent handling events for a device that is already being removed, e.g. when the platform generates both PCI event codes 0x304 and 0x308. In retrospect, deletion from the zpci_list in the release function without holding the zpci_list_lock was also racy. A side effect of this handling is that if the underlying device re-appears while the struct zpci_dev is in the ZPCI_FN_STATE_RESERVED state, the new and old instances of the struct zpci_dev and/or struct pci_dev may clash. For example when trying to create the IOMMU sysfs files for the new instance. In this case, re-adding the new instance is aborted. The old instance is removed, and the device will remain absent until the platform issues another event. Fix this by allowing the struct zpci_dev to be brought back up right until it is finally removed. To this end also keep the struct zpci_dev in the zpci_list until it is finally released when all references have been dropped. Deletion from the zpci_list from within the release function is made safe by using kref_put_lock() with the zpci_list_lock. This ensures that the releasing code holds the last reference. Cc: stable@vger.kernel.org Fixes: a46044a92add ("s390/pci: fix zpci_zdev_put() on reserve") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Tested-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27s390/pci: Remove redundant bus removal and disable from zpci_release_device()Niklas Schnelle
commit d76f9633296785343d45f85199f4138cb724b6d2 upstream. Remove zpci_bus_remove_device() and zpci_disable_device() calls from zpci_release_device(). These calls were done when the device transitioned into the ZPCI_FN_STATE_STANDBY state which is guaranteed to happen before it enters the ZPCI_FN_STATE_RESERVED state. When zpci_release_device() is called the device is known to be in the ZPCI_FN_STATE_RESERVED state which is also checked by a WARN_ON(). Cc: stable@vger.kernel.org Fixes: a46044a92add ("s390/pci: fix zpci_zdev_put() on reserve") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Tested-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-19s390/bpf: Store backchain even for leaf progsIlya Leoshkevich
[ Upstream commit 5f55f2168432298f5a55294831ab6a76a10cb3c3 ] Currently a crash in a leaf prog (caused by a bug) produces the following call trace: [<000003ff600ebf00>] bpf_prog_6df0139e1fbf2789_fentry+0x20/0x78 [<0000000000000000>] 0x0 This is because leaf progs do not store backchain. Fix by making all progs do it. This is what GCC and Clang-generated code does as well. Now the call trace looks like this: [<000003ff600eb0f2>] bpf_prog_6df0139e1fbf2789_fentry+0x2a/0x80 [<000003ff600ed096>] bpf_trampoline_201863462940+0x96/0xf4 [<000003ff600e3a40>] bpf_prog_05f379658fdd72f2_classifier_0+0x58/0xc0 [<000003ffe0aef070>] bpf_test_run+0x210/0x390 [<000003ffe0af0dc2>] bpf_prog_test_run_skb+0x25a/0x668 [<000003ffe038a90e>] __sys_bpf+0xa46/0xdb0 [<000003ffe038ad0c>] __s390x_sys_bpf+0x44/0x50 [<000003ffe0defea8>] __do_syscall+0x150/0x280 [<000003ffe0e01d5c>] system_call+0x74/0x98 Fixes: 054623105728 ("s390/bpf: Add s390x eBPF JIT compiler backend") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250512122717.54878-1-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29hypfs_create_cpu_files(): add missing check for hypfs_mkdir() failureAl Viro
[ Upstream commit 00cdfdcfa0806202aea56b02cedbf87ef1e75df8 ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29s390/tlb: Use mm_has_pgste() instead of mm_alloc_pgste()Heiko Carstens
[ Upstream commit 9291ea091b29bb3e37c4b3416c7c1e49e472c7d5 ] An mm has pgstes only after s390_enable_sie() has been called, while mm_alloc_pgste() may be always true (e.g. via sysctl setting). Limit the calls to gmap_unlink() in pte_free_tlb() to those cases where there might be something to unlink. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18s390/entry: Fix last breaking event handling in case of stack corruptionHeiko Carstens
[ Upstream commit ae952eea6f4a7e2193f8721a5366049946e012e7 ] In case of stack corruption stack_invalid() is called and the expectation is that register r10 contains the last breaking event address. This dependency is quite subtle and broke a couple of years ago without that anybody noticed. Fix this by getting rid of the dependency and read the last breaking event address from lowcore. Fixes: 56e62a737028 ("s390: convert to generic entry") Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18s390/pci: Fix missing check for zpci_create_device() error returnNiklas Schnelle
commit 42420c50c68f3e95e90de2479464f420602229fc upstream. The zpci_create_device() function returns an error pointer that needs to be checked before dereferencing it as a struct zpci_dev pointer. Add the missing check in __clp_add() where it was missed when adding the scan_list in the fixed commit. Simply not adding the device to the scan list results in the previous behavior. Cc: stable@vger.kernel.org Fixes: 0467cdde8c43 ("s390/pci: Sort PCI functions prior to creating virtual busses") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02crypto: lib/Kconfig - Hide arch options from userHerbert Xu
commit 17ec3e71ba797cdb62164fea9532c81b60f47167 upstream. The ARCH_MAY_HAVE patch missed arm64, mips and s390. But it may also lead to arch options being enabled but ineffective because of modular/built-in conflicts. As the primary user of all these options wireguard is selecting the arch options anyway, make the same selections at the lib/crypto option level and hide the arch options from the user. Instead of selecting them centrally from lib/crypto, simply set the default of each arch option as suggested by Eric Biggers. Change the Crypto API generic algorithms to select the top-level lib/crypto options instead of the generic one as otherwise there is no way to enable the arch options (Eric Biggers). Introduce a set of INTERNAL options to work around dependency cycles on the CONFIG_CRYPTO symbol. Fixes: 1047e21aecdf ("crypto: lib/Kconfig - Fix lib built-in failure when arch is modular") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Arnd Bergmann <arnd@kernel.org> Closes: https://lore.kernel.org/oe-kbuild-all/202502232152.JC84YDLp-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02KVM: s390: Don't use %pK through debug printingThomas Weißschuh
[ Upstream commit 0c7fbae5bc782429c97d68dc40fb126748d7e352 ] Restricted pointers ("%pK") are only meant to be used when directly printing to a file from task context. Otherwise it can unintentionally expose security sensitive, raw pointer values. Use regular pointer formatting instead. Link: https://lore.kernel.org/lkml/20250113171731-dc10e3c1-da64-4af0-b767-7c7070468023@linutronix.de/ Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Tested-by: Michael Mueller <mimu@linux.ibm.com> Link: https://lore.kernel.org/r/20250217-restricted-pointers-s390-v1-2-0e4ace75d8aa@linutronix.de Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-ID: <20250217-restricted-pointers-s390-v1-2-0e4ace75d8aa@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02KVM: s390: Don't use %pK through tracepointsThomas Weißschuh
[ Upstream commit 6c9567e0850be2f0f94ab64fa6512413fd1a1eb1 ] Restricted pointers ("%pK") are not meant to be used through TP_format(). It can unintentionally expose security sensitive, raw pointer values. Use regular pointer formatting instead. Link: https://lore.kernel.org/lkml/20250113171731-dc10e3c1-da64-4af0-b767-7c7070468023@linutronix.de/ Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Link: https://lore.kernel.org/r/20250217-restricted-pointers-s390-v1-1-0e4ace75d8aa@linutronix.de Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-ID: <20250217-restricted-pointers-s390-v1-1-0e4ace75d8aa@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20s390/cpumf: Fix double free on error in cpumf_pmu_event_init()Thomas Richter
commit aa1ac98268cd1f380c713f07e39b1fa1d5c7650c upstream. In PMU event initialization functions - cpumsf_pmu_event_init() - cpumf_pmu_event_init() - cfdiag_event_init() the partially created event had to be removed when an error was detected. The event::event_init() member function had to release all resources it allocated in case of error. event::destroy() had to be called on freeing an event after it was successfully created and event::event_init() returned success. With commit c70ca298036c ("perf/core: Simplify the perf_event_alloc() error path") this is not necessary anymore. The performance subsystem common code now always calls event::destroy() to clean up the allocated resources created during event initialization. Remove the event::destroy() invocation in PMU event initialization or that function is called twice for each event that runs into an error condition in event creation. This is the kernel log entry which shows up without the fix: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 43388 at lib/refcount.c:87 refcount_dec_not_one+0x74/0x90 CPU: 0 UID: 0 PID: 43388 Comm: perf Not tainted 6.15.0-20250407.rc1.git0.300.fc41.s390x+git #1 NONE Hardware name: IBM 3931 A01 704 (LPAR) Krnl PSW : 0704c00180000000 00000209cb2c1b88 (refcount_dec_not_one+0x78/0x90) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000020900000027 0000020900000023 0000000000000026 0000018900000000 00000004a2200a00 0000000000000000 0000000000000057 ffffffffffffffea 00000002b386c600 00000002b3f5b3e0 00000209cc51f140 00000209cc7fc550 0000000001449d38 ffffffffffffffff 00000209cb2c1b84 00000189d67dfb80 Krnl Code: 00000209cb2c1b78: c02000506727 larl %r2,00000209cbcce9c6 00000209cb2c1b7e: c0e5ffbd4431 brasl %r14,00000209caa6a3e0 #00000209cb2c1b84: af000000 mc 0,0 >00000209cb2c1b88: a7480001 lhi %r4,1 00000209cb2c1b8c: ebeff0a00004 lmg %r14,%r15,160(%r15) 00000209cb2c1b92: ec243fbf0055 risbg %r2,%r4,63,191,0 00000209cb2c1b98: 07fe bcr 15,%r14 00000209cb2c1b9a: 47000700 bc 0,1792 Call Trace: [<00000209cb2c1b88>] refcount_dec_not_one+0x78/0x90 [<00000209cb2c1dc4>] refcount_dec_and_mutex_lock+0x24/0x90 [<00000209caa3c29e>] hw_perf_event_destroy+0x2e/0x80 [<00000209cacaf8b4>] __free_event+0x74/0x270 [<00000209cacb47c4>] perf_event_alloc.part.0+0x4a4/0x730 [<00000209cacbf3e8>] __do_sys_perf_event_open+0x248/0xc20 [<00000209cacc14a4>] __s390x_sys_perf_event_open+0x44/0x50 [<00000209cb8114de>] __do_syscall+0x12e/0x260 [<00000209cb81ce34>] system_call+0x74/0x98 Last Breaking-Event-Address: [<00000209caa6a4d2>] __warn_printk+0xf2/0x100 ---[ end trace 0000000000000000 ]--- Fixes: c70ca298036c ("perf/core: Simplify the perf_event_alloc() error path") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20s390: Fix linker error when -no-pie option is unavailableSumanth Korikkar
commit 991a20173a1fbafd9fc0df0c7e17bb62d44a4deb upstream. The kernel build may fail if the linker does not support -no-pie option, as it always included in LDFLAGS_vmlinux. Error log: s390-linux-ld: unable to disambiguate: -no-pie (did you mean --no-pie ?) Although the GNU linker defaults to -no-pie, the ability to explicitly specify this option was introduced in binutils 2.36. Hence, fix it by adding -no-pie to LDFLAGS_vmlinux only when it is available. Cc: stable@vger.kernel.org Fixes: 00cda11d3b2e ("s390: Compile kernel with -fPIC and link with -no-pie") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503220342.T3fElO9L-lkp@intel.com/ Suggested-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFsNiklas Schnelle
commit 8691abd3afaadd816a298503ec1a759df1305d2e upstream. For non-VFs, zpci_bus_is_isolated_vf() should return false because they aren't VFs. While zpci_iov_find_parent_pf() specifically checks if a function is a VF, it then simply returns that there is no parent. The simplistic check for a parent then leads to these functions being confused with isolated VFs and isolating them on their own domain even if sibling PFs should share the domain. Fix this by explicitly checking if a function is not a VF. Note also that at this point the case where RIDs are ignored is already handled and in this case all PCI functions get isolated by being detected in zpci_bus_is_multifunction_root(). Cc: stable@vger.kernel.org Fixes: 2844ddbd540f ("s390/pci: Fix handling of isolated VFs") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20s390/pci: Fix s390_mmio_read/write syscall page fault handlingNiklas Schnelle
[ Upstream commit 41a0926e82f4963046876ed9a1b5f681be8087a8 ] The s390 MMIO syscalls when using the classic PCI instructions do not cause a page fault when follow_pfnmap_start() fails due to the page not being present. Besides being a general deficiency this breaks vfio-pci's mmap() handling once VFIO_PCI_MMAP gets enabled as this lazily maps on first access. Fix this by following a failed follow_pfnmap_start() with fixup_user_page() and retrying the follow_pfnmap_start(). Also fix a VM_READ vs VM_WRITE mixup in the read syscall. Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-1-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10s390/entry: Fix setting _CIF_MCCK_GUEST with lowcore relocationSven Schnelle
[ Upstream commit 121df45b37a1016ee6828c2ca3ba825f3e18a8c1 ] When lowcore relocation is enabled, the machine check handler doesn't use the lowcore address when setting _CIF_MCCK_GUEST. Fix this by adding the missing base register. Fixes: 0001b7bbc53a ("s390/entry: Make mchk_int_handler() ready for lowcore relocation") Reported-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10s390: Remove ioremap_wt() and pgprot_writethrough()Niklas Schnelle
[ Upstream commit c94bff63e49302d4ce36502a85a2710a67332a4f ] It turns out that while s390 architecture calls its memory-I/O mapping variants write-through and write-back the implementation of ioremap_wt() and pgprot_writethrough() does not match Linux notion of ioremap_wt(). In particular Linux expects ioremap_wt() to be weaker still than ioremap_wc(), allowing not just gathering and re-ordering but also reads to be served from cache. Instead s390's implementation is equivalent to normal ioremap() while its ioremap_wc() allows re-ordering. Note that there are no known users of ioremap_wt() on s390 and the resulting behavior is in line with asm-generic defining ioremap_wt() as ioremap(), if undefined, so no breakage is expected. As s390 does not have a mapping type matching the Linux notion of ioremap_wt() and pgprot_writethrough(), simply drop them and rely on the asm-generic fallbacks instead. Fixes: b02002cc4c0f ("s390/pci: Implement ioremap_wc/prot() with MIO") Fixes: b43b3fff042d ("s390: mm: convert to GENERIC_IOREMAP") Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-13mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()Ryan Roberts
commit 02410ac72ac3707936c07ede66e94360d0d65319 upstream. In order to fix a bug, arm64 needs to be told the size of the huge page for which the huge_pte is being cleared in huge_ptep_get_and_clear(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear() and set_huge_pte_at(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, loongarch, mips, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. Cc: stable@vger.kernel.org Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Link: https://lore.kernel.org/r/20250226120656.2400136-2-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13s390/traps: Fix test_monitor_call() inline assemblyHeiko Carstens
commit 5623bc23a1cb9f9a9470fa73b3a20321dc4c4870 upstream. The test_monitor_call() inline assembly uses the xgr instruction, which also modifies the condition code, to clear a register. However the clobber list of the inline assembly does not specify that the condition code is modified, which may lead to incorrect code generation. Use the lhi instruction instead to clear the register without that the condition code is modified. Furthermore this limits clearing to the lower 32 bits of val, since its type is int. Fixes: 17248ea03674 ("s390: fix __EMIT_BUG() macro") Cc: stable@vger.kernel.org Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27s390/boot: Fix ESSA detectionHeiko Carstens
commit c3a589fd9fcbf295a7402a4b188dc9277d505f4f upstream. The cmma_test_essa() inline assembly uses tmp as input and output, however tmp is specified as output only, which allows the compiler to optimize the initialization of tmp away. Therefore the ESSA detection may or may not work depending on previous contents of the register that the compiler selected for tmp. Fix this by using the correct constraint modifier. Fixes: 468a3bc2b7b9 ("s390/cmma: move parsing of cmma kernel parameter to early boot code") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21s390/pci: Fix handling of isolated VFsNiklas Schnelle
commit 2844ddbd540fc84d7571cca65d6c43088e4d6952 upstream. In contrast to the commit message of the fixed commit VFs whose parent PF is not configured are not always isolated, that is put on their own PCI domain. This is because for VFs to be added to an existing PCI domain it is enough for that PCI domain to share the same topology ID or PCHID. Such a matching PCI domain without a parent PF may exist when a PF from the same PCI card created the domain with the VF being a child of a different, non accessible, PF. While not causing technical issues it makes the rules which VFs are isolated inconsistent. Fix this by explicitly checking that the parent PF exists on the PCI domain determined by the topology ID or PCHID before registering the VF. This works because a parent PF which is under control of this Linux instance must be enabled and configured at the point where its child VFs appear because otherwise SR-IOV could not have been enabled on the parent. Fixes: 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") Cc: stable@vger.kernel.org Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21s390/pci: Pull search for parent PF out of zpci_iov_setup_virtfn()Niklas Schnelle
commit 05793884a1f30509e477de9da233ab73584b1c8c upstream. This creates a new zpci_iov_find_parent_pf() function which a future commit can use to find if a VF has a configured parent PF. Use zdev->rid instead of zdev->devfn such that the new function can be used before it has been decided if the RID will be exposed and zdev->devfn is set. Also handle the hypotheical case that the RID is not available but there is an otherwise matching zbus. Fixes: 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") Cc: stable@vger.kernel.org Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17s390/fpu: Add fpc exception handler / remove fixup section againHeiko Carstens
commit ae02615b7fcea9ce9a4ec40b3c5b5dafd322b179 upstream. The fixup section was added again by mistake when test_fp_ctl() was removed. The reason for the removal of the fixup section is described in commit 484a8ed8b7d1 ("s390/extable: add dedicated uaccess handler"). Remove it again for the same reason. Add an exception handler which handles exceptions when the floating point control register is attempted to be set to invalid values. The exception handler sets the floating point control register to zero and continues execution at the specified address. The new sfpc inline assembly is open-coded to make back porting a bit easier. Fixes: 702644249d3e ("s390/fpu: get rid of test_fp_ctl()") Cc: stable@vger.kernel.org Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17s390/pci: Fix SR-IOV for PFs initially in standbyNiklas Schnelle
commit dc287e4c9149ab54a5003b4d4da007818b5fda3d upstream. Since commit 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") PFs which are not initially configured but in standby are considered isolated. That is they create only a single function PCI domain. Due to the PCI domains being created on discovery, this means that even if they are configured later on, sibling PFs and their child VFs will not be added to their PCI domain breaking SR-IOV expectations. The reason the referenced commit ignored standby PFs for the creation of multi-function PCI subhierarchies, was to work around a PCI domain renumbering scenario on reboot. The renumbering would occur after removing a previously in standby PF, whose domain number is used for its configured sibling PFs and their child VFs, but which itself remained in standby. When this is followed by a reboot, the sibling PF is used instead to determine the PCI domain number of it and its child VFs. In principle it is not possible to know which standby PFs will be configured later and which may be removed. The PCI domain and root bus are pre-requisites for hotplug slots so the decision of which functions belong to which domain can not be postponed. With the renumbering occurring only in rare circumstances and being generally benign, accept it as an oddity and fix SR-IOV for initially standby PFs simply by allowing them to create PCI domains. Cc: stable@vger.kernel.org Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Fixes: 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17KVM: s390: vsie: fix some corner-cases when grabbing vsie pagesDavid Hildenbrand
commit 5f230f41fdd9e799f43a699348dc572bca7159aa upstream. We try to reuse the same vsie page when re-executing the vsie with a given SCB address. The result is that we use the same shadow SCB -- residing in the vsie page -- and can avoid flushing the TLB when re-running the vsie on a CPU. So, when we allocate a fresh vsie page, or when we reuse a vsie page for a different SCB address -- reusing the shadow SCB in different context -- we set ihcpu=0xffff to trigger the flush. However, after we looked up the SCB address in the radix tree, but before we grabbed the vsie page by raising the refcount to 2, someone could reuse the vsie page for a different SCB address, adjusting page->index and the radix tree. In that case, we would be reusing the vsie page with a wrong page->index. Another corner case is that we might set the SCB address for a vsie page, but fail the insertion into the radix tree. Whoever would reuse that page would remove the corresponding radix tree entry -- which might now be a valid entry pointing at another page, resulting in the wrong vsie page getting removed from the radix tree. Let's handle such races better, by validating that the SCB address of a vsie page didn't change after we grabbed it (not reuse for a different SCB; the alternative would be performing another tree lookup), and by setting the SCB address to invalid until the insertion in the tree succeeded (SCB addresses are aligned to 512, so ULONG_MAX is invalid). These scenarios are rare, the effects a bit unclear, and these issues were only found by code inspection. Let's CC stable to be safe. Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization") Cc: stable@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Tested-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Message-ID: <20250107154344.1003072-2-david@redhat.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17s390/futex: Fix FUTEX_OP_ANDN implementationHeiko Carstens
commit 26701574cee6777f867f89b4a5c667817e1ee0dd upstream. The futex operation FUTEX_OP_ANDN is supposed to implement *(int *)UADDR2 &= ~OPARG; The s390 implementation just implements an AND instead of ANDN. Add the missing bitwise not operation to oparg to fix this. This is broken since nearly 19 years, so it looks like user space is not making use of this operation. Fixes: 3363fbdd6fb4 ("[PATCH] s390: futex atomic operations") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17s390/stackleak: Use exrl instead of ex in __stackleak_poison()Sven Schnelle
[ Upstream commit a88c26bb8e04ee5f2678225c0130a5fbc08eef85 ] exrl is present in all machines currently supported, therefore prefer it over ex. This saves one instruction and doesn't need an additional register to hold the address of the target instruction. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08s390: Add '-std=gnu11' to decompressor and purgatory CFLAGSNathan Chancellor
commit 3b8b80e993766dc96d1a1c01c62f5d15fafc79b9 upstream. GCC changed the default C standard dialect from gnu17 to gnu23, which should not have impacted the kernel because it explicitly requests the gnu11 standard in the main Makefile. However, there are certain places in the s390 code that use their own CFLAGS without a '-std=' value, which break with this dialect change because of the kernel's own definitions of bool, false, and true conflicting with the C23 reserved keywords. include/linux/stddef.h:11:9: error: cannot use keyword 'false' as enumeration constant 11 | false = 0, | ^~~~~ include/linux/stddef.h:11:9: note: 'false' is a keyword with '-std=c23' onwards include/linux/types.h:35:33: error: 'bool' cannot be defined via 'typedef' 35 | typedef _Bool bool; | ^~~~ include/linux/types.h:35:33: note: 'bool' is a keyword with '-std=c23' onwards Add '-std=gnu11' to the decompressor and purgatory CFLAGS to eliminate these errors and make the C standard version of these areas match the rest of the kernel. Cc: stable@vger.kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20250122-s390-fix-std-for-gcc-15-v1-1-8b00cadee083@kernel.org Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08Revert "s390/mm: Allow large pages for KASAN shadow mapping"Vasily Gorbik
commit cc00550b2ae7ab1c7c56669fc004a13d880aaf0a upstream. This reverts commit ff123eb7741638d55abf82fac090bb3a543c1e74. Allowing large pages for KASAN shadow mappings isn't inherently wrong, but adding POPULATE_KASAN_MAP_SHADOW to large_allowed() exposes an issue in can_large_pud() and can_large_pmd(). Since commit d8073dc6bc04 ("s390/mm: Allow large pages only for aligned physical addresses"), both can_large_pud() and can_large_pmd() call _pa() to check if large page physical addresses are aligned. However, _pa() has a side effect: it allocates memory in POPULATE_KASAN_MAP_SHADOW mode. This results in massive memory leaks. The proper fix would be to address both large_allowed() and _pa()'s side effects, but for now, revert this change to avoid the leaks. Fixes: ff123eb77416 ("s390/mm: Allow large pages for KASAN shadow mapping") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08s390/sclp: Initialize sclp subsystem via arch_cpu_finalize_init()Heiko Carstens
[ Upstream commit 3bcc8a1af581af152d43e42b53db3534018301b5 ] With the switch to GENERIC_CPU_DEVICES an early call to the sclp subsystem was added to smp_prepare_cpus(). This will usually succeed since the sclp subsystem is implicitly initialized early enough if an sclp based console is present. If no such console is present the initialization happens with an arch_initcall(); in such cases calls to the sclp subsystem will fail. For CPU detection this means that the fallback sigp loop will be used permanently to detect CPUs instead of the preferred READ_CPU_INFO sclp request. Fix this by adding an explicit early sclp_init() call via arch_cpu_finalize_init(). Reported-by: Sheshu Ramanandan <sheshu.ramanandan@ibm.com> Fixes: 4a39f12e753d ("s390/smp: Switch to GENERIC_CPU_DEVICES") Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08s390/mm: Allow large pages for KASAN shadow mappingVasily Gorbik
[ Upstream commit ff123eb7741638d55abf82fac090bb3a543c1e74 ] Commit c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") introduced a large_allowed() helper that restricts which mapping modes can use large pages. This change unintentionally prevented KASAN shadow mappings from using large pages, despite there being no reason to avoid them. In fact, large pages are preferred for performance. Add POPULATE_KASAN_MAP_SHADOW to the allowed list in large_allowed() to restore large page mappings for KASAN shadows. While large_allowed() isn't strictly necessary with current mapping modes since disallowed modes either don't map anything or fail alignment and size checks, keep it for clarity. Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08perf/core: Save raw sample data conditionally based on sample typeYabin Cui
[ Upstream commit b9c44b91476b67327a521568a854babecc4070ab ] Currently, space for raw sample data is always allocated within sample records for both BPF output and tracepoint events. This leads to unused space in sample records when raw sample data is not requested. This patch enforces checking sample type of an event in perf_sample_save_raw_data(). So raw sample data will only be saved if explicitly requested, reducing overhead when it is not needed. Fixes: 0a9081cf0a11 ("perf/core: Add perf_sample_save_raw_data() helper") Signed-off-by: Yabin Cui <yabinc@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240515193610.2350456-2-yabinc@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-27s390/mm: Fix DirectMap accountingHeiko Carstens
commit 41856638e6c4ed51d8aa9e54f70059d1e357b46e upstream. With uncoupling of physical and virtual address spaces population of the identity mapping was changed to use the type POPULATE_IDENTITY instead of POPULATE_DIRECT. This breaks DirectMap accounting: > cat /proc/meminfo DirectMap4k: 55296 kB DirectMap1M: 18446744073709496320 kB Adjust all locations of update_page_count() in vmem.c to use POPULATE_IDENTITY instead of POPULATE_DIRECT as well. With this accounting is correct again: > cat /proc/meminfo DirectMap4k: 54264 kB DirectMap1M: 8334336 kB Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") Cc: stable@vger.kernel.org Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-27s390/mm: Consider KMSAN modules metadata for paging levelsVasily Gorbik
[ Upstream commit 282da38b465395c930687974627c24f47ddce5ff ] The calculation determining whether to use three- or four-level paging didn't account for KMSAN modules metadata. Include this metadata in the virtual memory size calculation to ensure correct paging mode selection and avoiding potentially unnecessary physical memory size limitations. Fixes: 65ca73f9fb36 ("s390/mm: define KMSAN metadata for vmalloc and modules") Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-27s390/ipl: Fix never less than zero warningAlexander Gordeev
[ Upstream commit 5fa49dd8e521a42379e5e41fcf2c92edaaec0a8b ] DEFINE_IPL_ATTR_STR_RW() macro produces "unsigned 'len' is never less than zero." warning when sys_vmcmd_on_*_store() callbacks are defined. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202412081614.5uel8F6W-lkp@intel.com/ Fixes: 247576bf624a ("s390/ipl: Do not accept z/VM CP diag X'008' cmds longer than max length") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14s390/pci: Fix leak of struct zpci_dev when zpci_add_device() failsNiklas Schnelle
commit 48796104c864cf4dafa80bd8c2ce88f9c92a65ea upstream. Prior to commit 0467cdde8c43 ("s390/pci: Sort PCI functions prior to creating virtual busses") the IOMMU was initialized and the device was registered as part of zpci_create_device() with the struct zpci_dev freed if either resulted in an error. With that commit this was moved into a separate function called zpci_add_device(). While this new function logs when adding failed, it expects the caller not to use and to free the struct zpci_dev on error. This difference between it and zpci_create_device() was missed while changing the callers and the incompletely initialized struct zpci_dev may get used in zpci_scan_configured_device in the error path. This then leads to a crash due to the device not being registered with the zbus. It was also not freed in this case. Fix this by handling the error return of zpci_add_device(). Since in this case the zdev was not added to the zpci_list it can simply be discarded and freed. Also make this more explicit by moving the kref_init() into zpci_add_device() and document that zpci_zdev_get()/zpci_zdev_put() must be used after adding. Cc: stable@vger.kernel.org Fixes: 0467cdde8c43 ("s390/pci: Sort PCI functions prior to creating virtual busses") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14s390/cpum_sf: Handle CPU hotplug remove during samplingThomas Richter
[ Upstream commit a0bd7dacbd51c632b8e2c0500b479af564afadf3 ] CPU hotplug remove handling triggers the following function call sequence: CPUHP_AP_PERF_S390_SF_ONLINE --> s390_pmu_sf_offline_cpu() ... CPUHP_AP_PERF_ONLINE --> perf_event_exit_cpu() The s390 CPUMF sampling CPU hotplug handler invokes: s390_pmu_sf_offline_cpu() +--> cpusf_pmu_setup() +--> setup_pmc_cpu() +--> deallocate_buffers() This function de-allocates all sampling data buffers (SDBs) allocated for that CPU at event initialization. It also clears the PMU_F_RESERVED bit. The CPU is gone and can not be sampled. With the event still being active on the removed CPU, the CPU event hotplug support in kernel performance subsystem triggers the following function calls on the removed CPU: perf_event_exit_cpu() +--> perf_event_exit_cpu_context() +--> __perf_event_exit_context() +--> __perf_remove_from_context() +--> event_sched_out() +--> cpumsf_pmu_del() +--> cpumsf_pmu_stop() +--> hw_perf_event_update() to stop and remove the event. During removal of the event, the sampling device driver tries to read out the remaining samples from the sample data buffers (SDBs). But they have already been freed (and may have been re-assigned). This may lead to a use after free situation in which case the samples are most likely invalid. In the best case the memory has not been reassigned and still contains valid data. Remedy this situation and check if the CPU is still in reserved state (bit PMU_F_RESERVED set). In this case the SDBs have not been released an contain valid data. This is always the case when the event is removed (and no CPU hotplug off occured). If the PMU_F_RESERVED bit is not set, the SDB buffers are gone. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14s390/pci: Ignore RID for isolated VFsNiklas Schnelle
[ Upstream commit 25f39d3dcb48bbc824a77d16b3d977f0f3713cfe ] Ensure that VFs used in isolation, that is with their parent PF not visible to the configuration but with their RID exposed, are treated compatibly with existing isolated VF use cases without exposed RID including RoCE Express VFs. This allows creating configurations where one LPAR manages PFs while their child VFs are used by other LPARs. This gives the LPAR managing the PFs a role analogous to that of the hypervisor in a typical use case of passing child VFs to guests. Instead of creating a multifunction struct zpci_bus whenever a PCI function with RID exposed is discovered only create such a bus for configured physical functions and only consider multifunction busses when searching for an existing bus. Additionally only set zdev->devfn to the devfn part of the RID once the function is added to a multifunction bus. This also fixes probing of more than 7 such isolated VFs from the same physical bus. This is because common PCI code in pci_scan_slot() only looks for more functions when pdev->multifunction is set which somewhat counter intutively is not the case for VFs. Note that PFs are looked at before their child VFs is guaranteed because we sort the zpci_list by RID ascending. Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14s390/pci: Use topology ID for multi-function devicesNiklas Schnelle
[ Upstream commit 126034faaac5f356822c4a9bebfa75664da11056 ] The newly introduced topology ID (TID) field in the CLP Query PCI Function explicitly identifies groups of PCI functions whose RIDs belong to the same (sub-)topology. When available use the TID instead of the PCHID to match zPCI busses/domains for multi-function devices. Note that currently only a single PCI bus per TID is supported. This change is required because in future machines the PCHID will not identify a PCI card but a specific port in the case of some multi-port NICs while from a PCI point of view the entire card is a subtopology. Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14s390/pci: Sort PCI functions prior to creating virtual bussesNiklas Schnelle
[ Upstream commit 0467cdde8c4320bbfdb31a8cff1277b202f677fc ] Instead of relying on the observed but not architected firmware behavior that PCI functions from the same card are listed in ascending RID order in clp_list_pci() ensure this by sorting. To allow for sorting separate the initial clp_list_pci() and creation of the virtual PCI busses. Note that fundamentally in our per-PCI function hotplug design non RID order of discovery is still possible. For example when the two PFs of a two port NIC are hotplugged after initial boot and in descending RID order. In this case the virtual PCI bus would be created by the second PF using that PF's UID as domain number instead of that of the first PF. Thus the domain number would then change from the UID of the second PF to that of the first PF on reboot but there is really nothing we can do about that since changing domain numbers at runtime seems even worse. This only impacts the domain number as the RIDs are consistent and thus even with just the second PF visible it will show up in the correct position on the virtual bus. Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09s390/stacktrace: Use break instead of return statementHeiko Carstens
commit 588a9836a4ef7ec3bfcffda526dfa399637e6cfc upstream. arch_stack_walk_user_common() contains a return statement instead of a break statement in case store_ip() fails while trying to store a callchain entry of a user space process. This may lead to a missing pagefault_enable() call. If this happens any subsequent page fault of the process won't be resolved by the page fault handler and this in turn will lead to the process being killed. Use a break instead of a return statement to fix this. Fixes: ebd912ff9919 ("s390/stacktrace: Merge perf_callchain_user() and arch_stack_walk_user()") Cc: stable@vger.kernel.org Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09s390/entry: Mark IRQ entries to fix stack depot warningsVasily Gorbik
commit 45c9f2b856a075a34873d00788d2e8a250c1effd upstream. The stack depot filters out everything outside of the top interrupt context as an uninteresting or irrelevant part of the stack traces. This helps with stack trace de-duplication, avoiding an explosion of saved stack traces that share the same IRQ context code path but originate from different randomly interrupted points, eventually exhausting the stack depot. Filtering uses in_irqentry_text() to identify functions within the .irqentry.text and .softirqentry.text sections, which then become the last stack trace entries being saved. While __do_softirq() is placed into the .softirqentry.text section by common code, populating .irqentry.text is architecture-specific. Currently, the .irqentry.text section on s390 is empty, which prevents stack depot filtering and de-duplication and could result in warnings like: Stack depot reached limit capacity WARNING: CPU: 0 PID: 286113 at lib/stackdepot.c:252 depot_alloc_stack+0x39a/0x3c8 with PREEMPT and KASAN enabled. Fix this by moving the IO/EXT interrupt handlers from .kprobes.text into the .irqentry.text section and updating the kprobes blacklist to include the .irqentry.text section. This is done only for asynchronous interrupts and explicitly not for program checks, which are synchronous and where the context beyond the program check is important to preserve. Despite machine checks being somewhat in between, they are extremely rare, and preserving context when possible is also of value. SVCs and Restart Interrupts are not relevant, one being always at the boundary to user space and the other being a one-time thing. IRQ entries filtering is also optionally used in ftrace function graph, where the same logic applies. Cc: stable@vger.kernel.org # 5.15+ Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05s390/pci: Fix potential double remove of hotplug slotNiklas Schnelle
[ Upstream commit c4a585e952ca403a370586d3f16e8331a7564901 ] In commit 6ee600bfbe0f ("s390/pci: remove hotplug slot when releasing the device") the zpci_exit_slot() was moved from zpci_device_reserved() to zpci_release_device() with the intention of keeping the hotplug slot around until the device is actually removed. Now zpci_release_device() is only called once all references are dropped. Since the zPCI subsystem only drops its reference once the device is in the reserved state it follows that zpci_release_device() must only deal with devices in the reserved state. Despite that it contains code to tear down from both configured and standby state. For the standby case this already includes the removal of the hotplug slot so would cause a double removal if a device was ever removed in either configured or standby state. Instead of causing a potential double removal in a case that should never happen explicitly WARN_ON() if a device in non-reserved state is released and get rid of the dead code cases. Fixes: 6ee600bfbe0f ("s390/pci: remove hotplug slot when releasing the device") Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Tested-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05iommu/s390: Implement blocking domainMatthew Rosato
[ Upstream commit ecda483339a5151e3ca30d6b82691ef6f1d17912 ] This fixes a crash when surprise hot-unplugging a PCI device. This crash happens because during hot-unplug __iommu_group_set_domain_nofail() attaching the default domain fails when the platform no longer recognizes the device as it has already been removed and we end up with a NULL domain pointer and UAF. This is exactly the case referred to in the second comment in __iommu_device_set_domain() and just as stated there if we can instead attach the blocking domain the UAF is prevented as this can handle the already removed device. Implement the blocking domain to use this handling. With this change, the crash is fixed but we still hit a warning attempting to change DMA ownership on a blocked device. Fixes: c76c067e488c ("s390/pci: Use dma-iommu layer") Co-developed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240910211516.137933-1-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>