summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-17s390/extmem: Add workaround for DCSS unload diagGerald Schaefer
When calling the diag for DCSS unload on a non-IPL CPU, the sclp maximum memory detection on the next IPL would falsely return the end of the previously loaded DCSS. This is because of an issue in z/VM, so work around it by always calling the diag for DCSS unload on IPL CPU 0. That CPU cannot be set offline, so the dcss_diag() call can directly be scheduled to CPU 0. The wrong maximum memory value returned by sclp would only affect KASAN kernels. When a DCSS within the falsely reported extra memory range is loaded and accessed again, it would result in a kernel crash: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 001c0000a3ffe000 TEID: 001c0000a3ffe803 Fault in home space mode while using kernel ASCE. AS:000000039955400b R2:00000003fe3b400b R3:000000037a2a8007 S:0000000000000020 Oops: 0010 ilc:3 [#1]SMP [...] CPU: 2 UID: 0 PID: 1563 Comm: mount Kdump: loaded Not tainted 6.15.0-rc5-11546-g3ea93fb3d026-dirty #7 NONE Hardware name: IBM 3931 A01 704 (z/VM 7.4.0) Krnl PSW : 0704c00180000000 000da6f2b338faf2 (kasan_check_range+0x172/0x310) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000040 001c0000a3ffe000 000000051fff0000 0000000000001000 0000000000000000 000da6f233380ff6 00000000000001f8 0000000000000000 001c0000a3ffe200 0000000000000040 001c0000a3ffe200 0000000000000200 000003ff97a2cfa8 0000000000000000 0000000000000010 000da672b58af070 Krnl Code: 000da6f2b338fae2: 41101008 la %r1,8(%r1) 000da6f2b338fae6: eca100268064 cgrj %r10,%r1,8,000da6f2b338fb32 #000da6f2b338faec: ebe00002000c srlg %r14,%r0,2 >000da6f2b338faf2: e3b010000002 ltg %r11,0(%r1) 000da6f2b338faf8: a77400a8 brc 7,000da6f2b338fc48 000da6f2b338fafc: 41b01008 la %r11,8(%r1) 000da6f2b338fb00: b904001b lgr %r1,%r11 000da6f2b338fb04: e3a0b0000002 ltg %r10,0(%r11) Call Trace: [<000da6f2b338faf2>] kasan_check_range+0x172/0x310 [<000da6f2b3390b3c>] __asan_memcpy+0x3c/0x90 [<000da6f233380ff6>] dcssblk_submit_bio+0x3a6/0x620 [dcssblk] [<000da6f2b3eb403c>] __submit_bio+0x25c/0x4a0 [<000da6f2b3eb43bc>] __submit_bio_noacct+0x13c/0x450 [<000da6f2b3eb4bde>] submit_bio_noacct_nocheck+0x50e/0x620 [<000da6f2b34f4978>] mpage_readahead+0x318/0x3f0 [<000da6f2b31edbe6>] read_pages+0x156/0x740 [<000da6f2b31ee594>] page_cache_ra_unbounded+0x3c4/0x610 [<000da6f2b31ef094>] force_page_cache_ra+0x1f4/0x2d0 [<000da6f2b31d092e>] filemap_get_pages+0x2ce/0xaa0 [<000da6f2b31d1428>] filemap_read+0x328/0x9a0 [<000da6f2b3e9b7e8>] blkdev_read_iter+0x228/0x3b0 [<000da6f2b340f7a6>] vfs_read+0x5b6/0x7f0 [<000da6f2b34110be>] ksys_read+0x10e/0x1e0 [<000da6f2b4e7acb2>] __do_syscall+0x122/0x1f0 [<000da6f2b4e93ffe>] system_call+0x6e/0x90 Last Breaking-Event-Address: [<000da6f2b338faac>] kasan_check_range+0x12c/0x310 Kernel panic - not syncing: Fatal exception: panic_on_oops Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-17Merge branch 'prot-key-async'Heiko Carstens
Harald Freudenberger says: ==================== This is a complete rework of the protected key AES (PAES) implementation. The goal of this rework is to implement the 4 modes (ecb, cbc, ctr, xts) in a real asynchronous fashion: - init(), exit() and setkey() are synchronous and don't allocate any memory. - the encrypt/decrypt functions first try to do the job in a synchronous manner. If this fails, for example the protected key got invalid caused by a guest suspend/resume or guest migration action, the encrypt/decrypt is transferred to an instance of the crypto engine (see below) for asynchronous processing. These postponed requests are then handled by the crypto engine by invoking the do_one_request() callback but may of course again run into a still not converted key or the key is getting invalid. If the key is still not converted, the first thread does the conversion and updates the key status in the transformation context. The conversion is invoked via pkey API with a new flag PKEY_XFLAG_NOMEMALLOC. Note that once there is an active requests enqueued to get async processed via crypto engine, further requests also need to go via crypto engine to keep the request sequence. This patch together with the pkey/zcrypt/AP extensions to support the new PKEY_XFLAG_NOMEMMALOC should toughen the paes crypto algorithms to truly meet the requirements for in-kernel skcipher implementations and the usage patterns for the dm-crypt and dm-integrity layers. The new flag PKEY_XFLAG_NOMEMALLOC tells the PKEY layer (and subsidiary layers) that it must not allocate any memory causing IO operations. Note that the patches for this pkey/zcrypt/AP extensions are currently in the features branch but may be seen in the master branch with the next merge. There is still some confusion about the way how paes treats the key within the transformation context. The tfm context may be shared by multiple requests running en/decryption with the same key. So the tfm context is supposed to be read-only. The s390 protected key support is in fact an encrypted key with the wrapping key sitting in the firmware. On each invocation of a protected key instruction the firmware unwraps the pkey and performs the operation. Part of the protected key is a hash about the wrapping key used - so the firmware is able to detect if a protected key matches to the wrapping key or not. If there is a mismatch the cpacf operation fails with cc 1 (key invalid). Such a situation can occur for example with a kvm live guest migration to another machine where the guest simple awakens in a new environment. As the wrapping key is NOT transfered, after the reawakening all protected key cpacf operations fail with "key invalid". There exist other situations where a protected key cpacf operation may run into "key invalid" and thus the code needs to be prepared for such cpacf failures. The recovery is simple: via pkey API the source key material (in real cases this is usually a secure key bound to a HSM) needs to generate a new protected key which is the wrapped by the wrapping key of the current firmware. So the paes tfms hold the source key material to be able to re-generate the protected key at any time. A naive implementation would hold the protected key in some kind of running context (for example the request context) and only the source key would be stored in the tfm context. But the derivation of the protected key from the source key is an expensive and time consuming process often involving interaction with a crypto card. And such a naive implementation would then for every tfm in use trigger the derivation process individual. So why not store the protected key in tfm context and only the very first process hitting the "invalid key" cc runs the derivation and updates the protected key stored in the tfm. The only really important thing is that the protected key update and cloning from this value needs to be done in a atomic fashion. Please note that there are still race conditions where the protected key stored in the tfm may get updated by an (outdated) protected key value. This is not an issue and the code handles this correctly by again re-deriving the protected key. The only fact that matters, is that the protected key must always be in a state where the cpacf instructions can figure out if it is valid (the hash part of the protected key matches to the hash of the wrapping key) or invalid (and refuse the crypto operation with "invalid key"). Changelog: v1 - first version. Applied and tested on top of the mentioned pkey/zcrypt/AP changes. Selftests and multithreaded testcases executed via AP_ALG interface run successful and even instrumented code (with some sleeps to force asynch pathes) ran fine. Code is good enough for a first code review and collecting feedback. v2 - A new patch which does a slight rework of the cpacf_pcc() inline function to return the condition code. A rework of the paes implementation based on feedback from Herbert and Ingo: - the spinlock is now consequently used to protect updates and changes on the protected key and protected key state within the transformation context. - setkey() is now synchronous - the walk is now held in the request context and thus the postponing of a request to the engine and later processing can continue at exactly the same state. - the param block needed for the cpacf instructions is constructed once and held in the request context. - if a request can't get handled synchronous, it is postponed for asynch processing via an instance of the crpyto engine. With v2 comes a patch which updates the crypto engine docu in Documentation/crypto. Feel free to use it or drop it or do some rework - at least it needs some review. v2 was only posted internal to collect some feedback within IBM. v3 - Slight improvements based on feedback from Finn. v4 - With feedback from Holger and Herbert Xu. Holger gave some good hints about better readability of the code and I picked nearly all his suggestions. Herbert noted that once a request goes via engine to keep the sequence as long as there are requests enqueued the following requests should also go via engine. This is now realized via a via_engine_ctr atomic counter in the tfm context. Stress tested with lots of debug code to run through all the failure paths of the code. Looks good. v5 - Fixed two typos and 1 too long line in the commit message found by Holger. Added Acked-by and Reviewed-by. Removed patch #3 which updates the crypto engine docu - this will go separate. All prepared for picking in the s390 subsystem. ==================== Link: https://lore.kernel.org/r/20250514090955.72370-1-freude@linux.ibm.com/ Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-17s390/crypto: Rework protected key AES for true asynch supportHarald Freudenberger
This is a complete rework of the protected key AES (PAES) implementation. The goal of this rework is to implement the 4 modes (ecb, cbc, ctr, xts) in a real asynchronous fashion: - init(), exit() and setkey() are synchronous and don't allocate any memory. - the encrypt/decrypt functions first try to do the job in a synchronous manner. If this fails, for example the protected key got invalid caused by a guest suspend/resume or guest migration action, the encrypt/decrypt is transferred to an instance of the crypto engine (see below) for asynchronous processing. These postponed requests are then handled by the crypto engine by invoking the do_one_request() callback but may of course again run into a still not converted key or the key is getting invalid. If the key is still not converted, the first thread does the conversion and updates the key status in the transformation context. The conversion is invoked via pkey API with a new flag PKEY_XFLAG_NOMEMALLOC. Note that once there is an active requests enqueued to get async processed via crypto engine, further requests also need to go via crypto engine to keep the request sequence. This patch together with the pkey/zcrypt/AP extensions to support the new PKEY_XFLAG_NOMEMMALOC should toughen the paes crypto algorithms to truly meet the requirements for in-kernel skcipher implementations and the usage patterns for the dm-crypt and dm-integrity layers. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20250514090955.72370-3-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-17s390/cpacf: Rework cpacf_pcc() to return condition codeHarald Freudenberger
Some of the pcc sub-functions have a protected key as input and thus may run into the situation that this key may be invalid for example due to live guest migration to another physical hardware. Rework the inline assembler function cpacf_pcc() to return the condition code (cc) as return value: 0 - cc code 0 (normal completion) 1 - cc code 1 (prot key wkvp mismatch or src op out of range) 2 - cc code 2 (something invalid, scalar multiply infinity, ...) Note that cc 3 (partial completion) is handled within the asm code and never returned. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250514090955.72370-2-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-05s390/mm: Fix potential use-after-free in __crst_table_upgrade()Heiko Carstens
The pointer to the mm_struct which is passed to __crst_table_upgrade() may only be dereferenced if it is identical to current->active_mm. Otherwise the current task has no reference to the mm_struct and it may already be freed. In such a case this would result in a use-after-free bug. Make sure this use-after-free scenario does not happen by moving the code, which dereferences the mm_struct pointer, after the check which verifies that the pointer is identical to current->active_mm, like it was before lazy ASCE handling was reimplemented. Fixes: 8b72f5a97b82 ("s390/mm: Reimplement lazy ASCE handling") Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-05s390/mm: Add mmap_assert_write_locked() check to crst_table_upgrade()Heiko Carstens
Add mmap_assert_write_locked() check to crst_table_upgrade() in order to verify that no concurrent page table upgrades of an mm can happen. This allows to remove the VM_BUG_ON() check which checks for the potential inconsistent result of concurrent updates. Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/string: Remove strcpy() implementationHeiko Carstens
Remove the optimized strcpy() library implementation. This doesn't make any difference since gcc recognizes all strcpy() usages anyway and uses the builtin variant. There is not a single branch to strcpy() within the generated kernel image, which also seems to be the reason why most other architectures don't have a strcpy() implementation anymore. Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/con3270: Use strscpy() instead of strcpy()Heiko Carstens
Use strscpy() instead of strcpy() so that bounds checking is performed on the destination buffer. This requires to keep track of the size of the dynamically allocated prompt memory area, which is done with a new prompt_sz within struct tty3270. Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/boot: Use strspcy() instead of strcpy()Heiko Carstens
Convert all strcpy() usages to strscpy(). strcpy() is deprecated since it performs no bounds checking on the destination buffer. Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390: Simple strcpy() to strscpy() conversionsHeiko Carstens
Convert all strcpy() usages to strscpy() where the conversion means just replacing strcpy() with strscpy(). strcpy() is deprecated since it performs no bounds checking on the destination buffer. Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30Merge branch 'zcrypt-no-alloc'Heiko Carstens
Harald Freudenberger says: ==================== This series of patches has the goal to open up a do-not-allocate memory path from the callers of the pkey in-kernel api down to the crypto cards and back. The asynch in-kernel cipher implementations (and the s390 PAES cipher implementations are one of them) may be called in a context where memory allocations which trigger IO is not acceptable. So this patch series reworks the AP bus code, the zcrypt layer, the pkey layer and the pkey handlers to respect this situation by processing a new parameter xflags (execution hints flags). There is a flag PKEY_XFLAG_NOMEMALLOC which tells the code to not allocate memory which may lead to IO operations. To reach this goal, the actual code changes have been differed. The zcrypt misc functions which need memory for cprb build use a pre allocated memory pool for this purpose. The findcard() functions have one temp memory area preallocated and protected with a mutex. Some smaller data is not allocated any more but went to the stack instead. The AP bus also uses a pre-allocated memory pool for building AP message requests. Note that the PAES implementation still needs to get reworked to run the protected key derivation in a real asynchronous way. However, this rework of AP bus, zcrypt and pkey is the base work required before reconsidering the PAES implementation. The patch series starts bottom (AP bus) and goes up the call chain (PKEY). At any time in the patch stack it should compile. For easier review I tried to have one logic code change by each patch and thus keep the patches "small". ==================== Link: https://lore.kernel.org/r/20250424133619.16495-1-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/pkey/crypto: Introduce xflags param for pkey in-kernel APIHarald Freudenberger
Add a new parameter xflags to the in-kernel API function pkey_key2protkey(). Currently there is only one flag supported: * PKEY_XFLAG_NOMEMALLOC: If this flag is given in the xflags parameter, the pkey implementation is not allowed to allocate memory but instead should fall back to use preallocated memory or simple fail with -ENOMEM. This flag is for protected key derive within a cipher or similar which must not allocate memory which would cause io operations - see also the CRYPTO_ALG_ALLOCATES_MEMORY flag in crypto.h. The one and only user of this in-kernel API - the skcipher implementations PAES in paes_s390.c set this flag upon request to derive a protected key from the given raw key material. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-26-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/pkey: Provide and pass xflags within pkey and zcrypt layersHarald Freudenberger
Provide and pass the xflag parameter from pkey ioctls through the pkey handler and further down to the implementations (CCA, EP11, PCKMO and UV). So all the code is now prepared and ready to support xflags ("execution flag"). The pkey layer supports the xflag PKEY_XFLAG_NOMEMALLOC: If this flag is given in the xflags parameter, the pkey implementation is not allowed to allocate memory but instead should fall back to use preallocated memory or simple fail with -ENOMEM. This flag is for protected key derive within a cipher or similar which must not allocate memory which would cause io operations - see also the CRYPTO_ALG_ALLOCATES_MEMORY flag in crypto.h. Within the pkey handlers this flag is then to be translated to appropriate zcrypt xflags before any zcrypt related functions are called. So the PKEY_XFLAG_NOMEMALLOC translates to ZCRYPT_XFLAG_NOMEMALLOC - If this flag is set, no memory allocations which may trigger any IO operations are done. The pkey in-kernel pkey API still does not provide this xflag param. That's intended to come with a separate patch which enables this functionality. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-25-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/uv: Remove uv_get_secret_metadata functionHarald Freudenberger
The uv_get_secret_metadata() in-kernel function was only offered and used by the pkey uv handler. Remove it as there is no customer any more. Suggested-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Acked-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-24-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/pkey: Use preallocated memory for retrieve of UV secret metadataHarald Freudenberger
The pkey uv functions may be called in a situation where memory allocations which trigger IO operations are not allowed. An example: decryption of the swap partition with protected key (PAES). The pkey uv code takes care of this by holding one preallocated struct uv_secret_list to be used with the new UV function uv_find_secret(). The older function uv_get_secret_metadata() used before always allocates/frees an ephemeral memory buffer. The preallocated struct is concurrency protected by a mutex. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-23-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/uv: Rename find_secret() to uv_find_secret() and publishHarald Freudenberger
Rename the internal UV function find_secret() to uv_find_secret() and publish it as new UV API in-kernel function. The pkey uv handler may be called in a do-not-allocate memory situation where sleeping is allowed but allocating memory which may cause IO operations is not. For example when an encrypted swap file is used and the encryption is done via UV retrievable secrets with protected keys. The UV API function uv_get_secret_metadata() allocates memory and then calls the find_secret() function. By exposing the find_secret() function as a new UV API function uv_find_secret() it is possible to retrieve UV secret meta data without any memory allocations from the UV when the caller offers space for one struct uv_secret_list. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Acked-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-22-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/pkey: Rework EP11 pkey handler to use stack for small memory allocsHarald Freudenberger
There have been some places in the EP11 handler code where relatively small amounts of memory have been allocated an freed at the end of the function. This code has been reworked to use the stack instead. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-21-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/pkey: Rework CCA pkey handler to use stack for small memory allocsHarald Freudenberger
There have been some places in the CCA handler code where relatively small amounts of memory have been allocated an freed at the end of the function. This code has been reworked to use the stack instead. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-20-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Rework ep11 misc functions to use cprb mempoolHarald Freudenberger
There are two places in the ep11 misc code where a short term memory buffer is needed. Rework this code to use the cprb mempool to satisfy this ephemeral memory requirements. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-19-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Locate ep11_domain_query_info onto the stack instead of kmallocHarald Freudenberger
Locate the relative small struct ep11_domain_query_info variable onto the stack instead of kmalloc()/kfree(). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-18-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Propagate xflags argument with cca_get_info()Harald Freudenberger
Propagate the xflags argument from the cca_get_info() caller down to the lower level functions for proper memory allocation hints. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-17-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Rework cca misc functions kmallocs to use the cprb mempoolHarald Freudenberger
Rework two places in the zcrypt cca misc code using kmalloc() for ephemeral memory allocation. As there is anyway now a cprb mempool let's use this pool instead to satisfy these short term memory allocations. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-16-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Rework ep11 findcard() implementation and callersHarald Freudenberger
Rework the memory usage of the ep11 findcard() implementation: - findcard does not allocate memory for the list of apqns any more. - the callers are now responsible to provide an array of apqns to store the matching apqns into. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-15-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Rework cca findcard() implementation and callersHarald Freudenberger
Rework the memory usage of the cca findcard() implementation: - findcard does not allocate memory for the list of apqns any more. - the callers are now responsible to provide an array of apqns to store the matching apqns into. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-14-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Remove CCA and EP11 card and domain info cachesHarald Freudenberger
Remove the caching of the CCA and EP11 card and domain info. In nearly all places where the card or domain info is fetched the verify param was enabled and thus the cache was bypassed. The only real place where info from the cache was used was in the sysfs pseudo files in cases where the card/queue was switched to "offline". All other callers insisted on getting fresh info and thus a communication to the card was enforced. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-13-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Remove unused functions from cca miscHarald Freudenberger
The static function findcard() and the zcrypt cca_findcard() function are both not used any more. Remove this outdated code and an internal function only called by these. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-12-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Introduce pre-allocated device status array for ep11 miscHarald Freudenberger
Introduce a pre-allocated device status array memory together with a mutex controlling the occupation to be used by the findcard() function. Limit the device status array to max 128 cards and max 128 domains to reduce the size of this pre-allocated memory to 64 KB. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-11-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Introduce pre-allocated device status array for cca miscHarald Freudenberger
Introduce a pre-allocated device status array memory together with a mutex controlling the occupation to be used by the findcard2() function. Limit the device status array to max 128 cards and max 128 domains to reduce the size of this pre-allocated memory to 64 KB. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-10-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Rework zcrypt function zcrypt_device_status_mask_extHarald Freudenberger
Rework the existing function zcrypt_device_status_mask_ext(): Add two new parameters to provide upper limits for cards and queues. The existing implementation needed an array of 256 * 256 * 4 = 256 KB which is really huge. The reworked function is more flexible in the sense that the caller can decide the upper limit for cards and domains to be stored into the status array. So for example a caller may decide to only query for cards 0...127 and queues 0...127 and thus only an array of size 128 * 128 * 4 = 64 KB is needed. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-9-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Introduce cprb mempool for ep11 misc functionsHarald Freudenberger
Introduce a cprb mempool for the zcrypt ep11 misc functions (zcrypt_ep11misc.*) do some preparation rework to support a do-not-allocate path through some zcrypt ep11 misc functions. The mempool is controlled by the zcrypt module parameter "mempool_threshold" which shall control the minimal amount of memory items for CCA and EP11. The mempool shall support "mempool_threshold" requests/replies in parallel which means for EP11 to hold a send and receive buffer memory per request. Each of this cprb space items is limited to 8 KB. So by default the mempool consumes 5 * 2 * 8KB = 80KB If the mempool is depleted upon one ep11 misc functions is called with the ZCRYPT_XFLAG_NOMEMALLOC xflag set, the function will fail with -ENOMEM and the caller is responsible for taking further actions. This is only part of an rework to support a new xflag ZCRYPT_XFLAG_NOMEMALLOC but not yet complete. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-8-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Introduce cprb mempool for cca misc functionsHarald Freudenberger
Introduce a new module parameter "zcrypt_mempool_threshold" for the zcrypt module. This parameter controls the minimal amount of mempool items which are pre-allocated for urgent requests/replies and will be used with the support for the new xflag ZCRYPT_XFLAG_NOMEMALLOC. The default value of 5 shall provide enough memory items to support up to 5 requests (and their associated reply) in parallel. The minimum value is 1 and is checked in zcrypt module init(). If the mempool is depleted upon one cca misc functions is called with the named xflag set, the function will fail with -ENOMEM and the caller is responsible for taking further actions. For CCA each mempool item is 16KB, as a CCA CPRB needs to hold the request and the reply. The pool items only support requests/replies with a limit of about 8KB. So by default the CCA mempool consumes 5 * 16KB = 80KB This is only part of an rework to support a new xflag ZCRYPT_XFLAG_NOMEMALLOC but not yet complete. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-7-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/ap/zcrypt: New xflag parameterHarald Freudenberger
Introduce a new flag parameter for the both cprb send functions zcrypt_send_cprb() and zcrypt_send_ep11_cprb(). This new xflags parameter ("execution flags") shall be used to provide execution hints and flags for this crypto request. There are two flags implemented to be used with these functions: * ZCRYPT_XFLAG_USERSPACE - indicates to the lower layers that all the ptrs address userspace. So when construction the ap msg copy_from_user() is to be used. If this flag is NOT set, the ptrs address kernel memory and thus memcpy() is to be used. * ZCRYPT_XFLAG_NOMEMALLOC - indicates that this task must not allocate memory which may be allocated with io operations. For the AP bus and zcrypt message layer this means: * The ZCRYPT_XFLAG_USERSPACE is mapped to the already existing bool variable "userspace" which is propagated to the zcrypt proto implementations. * The ZCRYPT_XFLAG_NOMEMALLOC results in setting the AP flag AP_MSG_FLAG_MEMPOOL when the AP msg buffer is initialized. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-6-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Avoid alloc and copy of ep11 targets if kernelspace cprbHarald Freudenberger
If there is a target list of APQNs given when an CPRB is to be send via zcrypt_send_ep11_cprb() there is always a kmalloc() done and the targets are copied via z_copy_from_user. As there are callers from kernel space (zcrypt_ep11misc.c) which signal this via the userspace parameter improve this code to directly use the given target list in case of kernelspace thus removing the unnecessary memory alloc and mem copy. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-5-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/ap: Introduce ap message buffer poolHarald Freudenberger
There is a need for a do-not-allocate-memory path through the AP bus layer. The pkey layer may be triggered via the in-kernel interface from a protected key crypto algorithm (namely PAES) to convert a secure key into a protected key. This happens in a workqueue context, so sleeping is allowed but memory allocations causing IO operations are not permitted. To accomplish this, an AP message memory pool with pre-allocated space is established. When ap_init_apmsg() with use_mempool set to true is called, instead of kmalloc() the ap message buffer is allocated from the ap_msg_pool. This pool only holds a limited amount of buffers: ap_msg_pool_min_items with the item size AP_DEFAULT_MAX_MSG_SIZE and exactly one of these items (if available) is returned if ap_init_apmsg() with the use_mempool arg set to true is called. When this pool is exhausted and use_mempool is set true, ap_init_apmsg() returns -ENOMEM without any attempt to allocate memory and the caller has to deal with that. Default values for this mempool of ap messages is: * Each buffer is 12KB (that is the default AP bus size and all the urgent messages should fit into this space). * Minimum items held in the pool is 8. This value is adjustable via module parameter ap.msgpool_min_items. The zcrypt layer may use this flag to indicate to the ap bus that the processing path for this message should not allocate memory but should use pre-allocated memory buffer instead. This is to prevent deadlocks with crypto and io for example with encrypted swap volumes. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-4-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/ap/zcrypt: Rework AP message buffer allocationHarald Freudenberger
Slight rework on the way how AP message buffers are allocated. Instead of having multiple places with kmalloc() calls all the AP message buffers are now allocated and freed on exactly one place: ap_init_apmsg() allocates the current AP bus max limit of ap_max_msg_size (defaults to 12KB). The AP message buffer is then freed in ap_release_apmsg(). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-3-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/ap: Move response_type struct into ap_msg structHarald Freudenberger
Move the very small response_type struct into struct ap_msg. So there is no need to kmalloc this tiny struct with each ap message preparation. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-2-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/cpumf: Adjust number of leading zeroes for z15 attributesThomas Richter
In CPUMF attribute definitions for z15 all CPUMF attributes have configuration values of the form 0x0[0-9a-f]{3} . However 2 defines do not match this scheme, they have two leading zeroes instead of one. Adjust this. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-17s390: Remove optional third argument of strscpy() if possibleHeiko Carstens
The third argument of strscpy() is optional and can be left away iff the destination is an array and the maximum size of the copy is the size of destination. Remove the third argument for those cases where this is possible. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-17s390/ipl: Rename and change strncpy_skip_quote()Heiko Carstens
Rename strncpy_skip_quote() to strscpy_skip_quote() and change its implementation so that the destination string is always NUL terminated. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-17s390/string: Remove optimized strncpy()Heiko Carstens
There are hardly any strncpy() users left, therefore drop the optimized s390 variant. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-16watchdog: diag288_wdt: Implement module autoloadHeiko Carstens
The s390 specific diag288_wdt watchdog driver makes use of the virtual watchdog timer, which is available in most machine configurations. If executing the diagnose instruction with subcode 0x288 results in an exception the watchdog timer is not available, otherwise it is available. In order to allow module autoload of the diag288_wdt module, move the detection of the virtual watchdog timer to early boot code, and provide its availability as a cpu feature. This allows to make use of module_cpu_feature_match() to automatically load the module iff the virtual watchdog timer is available. Suggested-by: Marc Hartmayer <mhartmay@linux.ibm.com> Tested-by: Mete Durlu <meted@linux.ibm.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20250410095036.1525057-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-14s390/boot: Replace strncpy() with strscpy()Vasily Gorbik
Replace the last 2 usages of strncpy() in s390 code with strscpy(). Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-14s390/boot: Add sized_strscpy() to enable strscpy() usageVasily Gorbik
Add a simple sized_strscpy() implementation to allow the use of strscpy() in the decompressor. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-14s390/mm: Select ARCH_WANT_IRQS_OFF_ACTIVATE_MMHeiko Carstens
Select ARCH_WANT_IRQS_OFF_ACTIVATE_MM so that activate_mm() is called with irqs disabled. This allows to call switch_mm_irqs_off() instead of switch_mm() and saves two local_irq_save() / local_irq_restore() pairs. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-14s390/mm: Reimplement lazy ASCE handlingHeiko Carstens
Reduce system call overhead time (round trip time for invoking a non-existent system call) by 25%. With the removal of set_fs() [1] lazy control register handling was removed in order to keep kernel entry and exit simple. However this made system calls slower. With the conversion to generic entry [2] and numerous follow up changes which simplified the entry code significantly, adding support for lazy asce handling doesn't add much complexity to the entry code anymore. In particular this means: - On kernel entry the primary asce is not modified and contains the user asce - Kernel accesses which require secondary-space mode (for example futex operations) are surrounded by enable_sacf_uaccess() and disable_sacf_uaccess() calls. enable_sacf_uaccess() sets the primary asce to kernel asce so that the sacf instruction can be used to switch to secondary-space mode. The primary asce is changed back to user asce with disable_sacf_uaccess(). The state of the control register which contains the primary asce is reflected with a new TIF_ASCE_PRIMARY bit. This is required on context switch so that the correct asce is restored for the scheduled in process. In result address spaces are now setup like this: CPU running in | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE -----------------------------|-----------|-----------|----------- user space | user | user | kernel kernel (no sacf) | user | user | kernel kernel (during sacf uaccess) | kernel | user | kernel kernel (kvm guest execution) | guest | user | kernel In result cr1 control register content is not changed except for: - futex system calls - legacy s390 PCI system calls - the kvm specific cmpxchg_user_key() uaccess helper This leads to faster system call execution. [1] 87d598634521 ("s390/mm: remove set_fs / rework address space handling") [2] 56e62a737028 ("s390: convert to generic entry") Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-13Linux 6.15-rc2v6.15-rc2Linus Torvalds
2025-04-13Merge tag 'erofs-for-6.15-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Properly handle errors when file-backed I/O fails - Fix compilation issues on ARM platform (arm-linux-gnueabi) - Fix parsing of encoded extents - Minor cleanup * tag 'erofs-for-6.15-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: remove duplicate code erofs: fix encoded extents handling erofs: add __packed annotation to union(__le16..) erofs: set error to bio if file-backed IO fails
2025-04-13Merge tag 'ext4_for_linus-6.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A few more miscellaneous ext4 bug fixes and cleanups including some syzbot failures and fixing a stale file handing refeencing an inode previously used as a regular file, but which has been deleted and reused as an ea_inode would result in ext4 erroneously considering this a case of fs corruption" * tag 'ext4_for_linus-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix off-by-one error in do_split ext4: make block validity check resistent to sb bh corruption ext4: avoid -Wflex-array-member-not-at-end warning Documentation: ext4: Add fields to ext4_super_block documentation ext4: don't treat fhandle lookup of ea_inode as FS corruption
2025-04-13Merge tag 'fixes-2025-04-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fix from Mike Rapoport: "Fix build of memblock test. Add missing stubs for mutex and free_reserved_area() to memblock tests" * tag 'fixes-2025-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock tests: Fix mutex related build error
2025-04-12ext4: fix off-by-one error in do_splitArtem Sadovnikov
Syzkaller detected a use-after-free issue in ext4_insert_dentry that was caused by out-of-bounds access due to incorrect splitting in do_split. BUG: KASAN: use-after-free in ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109 Write of size 251 at addr ffff888074572f14 by task syz-executor335/5847 CPU: 0 UID: 0 PID: 5847 Comm: syz-executor335 Not tainted 6.12.0-rc6-syzkaller-00318-ga9cda7c0ffed #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 __asan_memcpy+0x40/0x70 mm/kasan/shadow.c:106 ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109 add_dirent_to_buf+0x3d9/0x750 fs/ext4/namei.c:2154 make_indexed_dir+0xf98/0x1600 fs/ext4/namei.c:2351 ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2455 ext4_add_nondir+0x8d/0x290 fs/ext4/namei.c:2796 ext4_symlink+0x920/0xb50 fs/ext4/namei.c:3431 vfs_symlink+0x137/0x2e0 fs/namei.c:4615 do_symlinkat+0x222/0x3a0 fs/namei.c:4641 __do_sys_symlink fs/namei.c:4662 [inline] __se_sys_symlink fs/namei.c:4660 [inline] __x64_sys_symlink+0x7a/0x90 fs/namei.c:4660 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> The following loop is located right above 'if' statement. for (i = count-1; i >= 0; i--) { /* is more than half of this entry in 2nd half of the block? */ if (size + map[i].size/2 > blocksize/2) break; size += map[i].size; move++; } 'i' in this case could go down to -1, in which case sum of active entries wouldn't exceed half the block size, but previous behaviour would also do split in half if sum would exceed at the very last block, which in case of having too many long name files in a single block could lead to out-of-bounds access and following use-after-free. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Cc: stable@vger.kernel.org Fixes: 5872331b3d91 ("ext4: fix potential negative array index in do_split()") Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250404082804.2567-3-a.sadovnikov@ispras.ru Signed-off-by: Theodore Ts'o <tytso@mit.edu>