linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-05-27	Merge tag 'ratelimit.2025.05.25a' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull rate-limit updates from Paul McKenney: "lib/ratelimit: Reduce false-positive and silent misses: - Reduce open-coded use of ratelimit_state structure fields. - Convert the ->missed field to atomic_t. - Count misses that are due to lock contention. - Eliminate jiffies=0 special case. - Reduce ___ratelimit() false-positive rate limiting (Petr Mladek). - Allow zero ->burst to hard-disable rate limiting. - Optimize away atomic operations when a miss is guaranteed. - Warn if ->interval or ->burst are negative (Petr Mladek). - Simplify the resulting code. A smoke test and stress test have been created, but they are not yet ready for mainline. With luck, we will offer them for the v6.17 merge window" * tag 'ratelimit.2025.05.25a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: ratelimit: Drop redundant accesses to burst ratelimit: Use nolock_ret restructuring to collapse common case code ratelimit: Use nolock_ret label to collapse lock-failure code ratelimit: Use nolock_ret label to save a couple of lines of code ratelimit: Simplify common-case exit path ratelimit: Warn if ->interval or ->burst are negative ratelimit: Avoid atomic decrement under lock if already rate-limited ratelimit: Avoid atomic decrement if already rate-limited ratelimit: Don't flush misses counter if RATELIMIT_MSG_ON_RELEASE ratelimit: Force re-initialization when rate-limiting re-enabled ratelimit: Allow zero ->burst to disable ratelimiting ratelimit: Reduce ___ratelimit() false-positive rate limiting ratelimit: Avoid jiffies=0 special case ratelimit: Count misses due to lock contention ratelimit: Convert the ->missed field to atomic_t drm/amd/pm: Avoid open-coded use of ratelimit_state structure's internals drm/i915: Avoid open-coded use of ratelimit_state structure's ->missed field random: Avoid open-coded use of ratelimit_state structure's ->missed field ratelimit: Create functions to handle ratelimit_state internals
2025-05-27	Merge tag 'timers-cleanups-2025-05-25' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "Another set of timer API cleanups: - Convert init_timer(), try_to_del_timer_sync() and destroy_timer_on_stack() over to the canonical timer_() namespace convention. There is another large conversion pending, which has not been included because it would have caused a gazillion of merge conflicts in next. The conversion scripts will be run towards the end of the merge window and a pull request sent once all conflict dependencies have been merged" * tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack() treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try() timers: Rename init_timers() as timers_init() timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA timers: Rename __init_timer_on_stack() as __timer_init_on_stack() timers: Rename __init_timer() as __timer_init() timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack() timers: Rename init_timer_key() as timer_init_key()
2025-05-12	crypto: lib/chacha - add strongly-typed state zeroization	Eric Biggers
	Now that the ChaCha state matrix is strongly-typed, add a helper function chacha_zeroize_state() which zeroizes it. Then convert all applicable callers to use it instead of direct memzero_explicit. No functional changes. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12	crypto: lib/chacha - strongly type the ChaCha state	Eric Biggers
	The ChaCha state matrix is 16 32-bit words. Currently it is represented in the code as a raw u32 array, or even just a pointer to u32. This weak typing is error-prone. Instead, introduce struct chacha_state: struct chacha_state { u32 x[16]; }; Convert all ChaCha and HChaCha functions to use struct chacha_state. No functional changes. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-08	random: Avoid open-coded use of ratelimit_state structure's ->missed field	Paul E. McKenney
	The _credit_init_bits() function directly accesses the ratelimit_state structure's ->missed field, which works, but which also makes it more difficult to change this field. Therefore, make use of the ratelimit_state_get_miss() and ratelimit_state_inc_miss() functions instead of directly accessing the ->missed field. Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/ Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: "Theodore Ts'o" <tytso@mit.edu> "Jason A. Donenfeld" <Jason@zx2c4.com>
2025-05-08	treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack()	Ingo Molnar
	Move this API to the canonical timer_*() namespace. Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250507175338.672442-10-mingo@kernel.org
2025-05-08	treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try()	Ingo Molnar
	Move this API to the canonical timer_*() namespace. Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250507175338.672442-9-mingo@kernel.org
2025-04-05	treewide: Switch/rename to timer_delete[_sync]()	Thomas Gleixner
	timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-21	vdso: Add generic random data storage	Thomas Weißschuh
	Extend the generic vDSO data storage with a page for the random state data. The random state data is stored in a dedicated page, as the existing storage page is only meant for time-related, time-namespace-aware data. This simplifies to access logic to not need to handle time namespaces anymore and also frees up more space in the time-related page. In case further generic vDSO data store is required it can be added to the random state page. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-6-13a4669dfc8c@linutronix.de
2025-01-28	treewide: const qualify ctl_tables where applicable	Joel Granados
	Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2024-09-13	random: vDSO: add __arch_get_k_vdso_rng_data() helper for data page access	Christophe Leroy
	_vdso_data is specific to x86 and __arch_get_k_vdso_data() is provided so that all architectures can provide the requested pointer. Do the same with _vdso_rng_data, provide __arch_get_k_vdso_rng_data() and don't use x86 _vdso_rng_data directly. Until now vdso/vsyscall.h was only included by time/vsyscall.c but now it will also be included in char/random.c, leading to a duplicate declaration of _vdso_data and _vdso_rng_data. To fix this issue, move the declaration in a C file. vma.c looks like the most appropriate candidate. We don't need to replace the definitions in vsyscall.h by declarations as declarations are already in asm/vvar.h. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13	random: vDSO: don't use 64-bit atomics on 32-bit architectures	Christophe Leroy
	Performing SMP atomic operations on u64 fails on powerpc32: CC drivers/char/random.o In file included from <command-line>: drivers/char/random.c: In function 'crng_reseed': ././include/linux/compiler_types.h:510:45: error: call to '__compiletime_assert_391' declared with attribute error: Need native word sized stores/loads for atomicity. 510 \| _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) \| ^ ././include/linux/compiler_types.h:491:25: note: in definition of macro '__compiletime_assert' 491 \| prefix ## suffix(); \ \| ^~~~~~ ././include/linux/compiler_types.h:510:9: note: in expansion of macro '_compiletime_assert' 510 \| _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) \| ^~~~~~~~~~~~~~~~~~~ ././include/linux/compiler_types.h:513:9: note: in expansion of macro 'compiletime_assert' 513 \| compiletime_assert(__native_word(t), \ \| ^~~~~~~~~~~~~~~~~~ ./arch/powerpc/include/asm/barrier.h:74:9: note: in expansion of macro 'compiletime_assert_atomic_type' 74 \| compiletime_assert_atomic_type(*p); \ \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/asm-generic/barrier.h:172:55: note: in expansion of macro '__smp_store_release' 172 \| #define smp_store_release(p, v) do { kcsan_release(); __smp_store_release(p, v); } while (0) \| ^~~~~~~~~~~~~~~~~~~ drivers/char/random.c:286:9: note: in expansion of macro 'smp_store_release' 286 \| smp_store_release(&__arch_get_k_vdso_rng_data()->generation, next_gen + 1); \| ^~~~~~~~~~~~~~~~~ The kernel-side generation counter in the random driver is handled as an unsigned long, not as a u64, in base_crng and struct crng. But on the vDSO side, it needs to be an u64, not just an unsigned long, in order to support a 32-bit vDSO atop a 64-bit kernel. On kernel side, however, it is an unsigned long, hence a 32-bit value on 32-bit architectures, so just cast it to unsigned long for the smp_store_release(). A side effect is that on big endian architectures the store will be performed in the upper 32 bits. It is not an issue on its own because the vDSO site doesn't mind the value, as it only checks differences. Just make sure that the vDSO side checks the full 64 bits. For that, the local current_generation has to be u64 as well. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-24	sysctl: treewide: constify the ctl_table argument of proc_handlers	Joel Granados
	const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer\|interval)_handler\|sched_(rt\|rr)_handler\|rds_tcp_skbuf_handler\|proc_sctp_do_(hmac_alg\|rto_min\|rto_max\|udp_port\|alpha_beta\|auth\|probe_interval)"; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int write, void buffer, size_t lenp, loff_t ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int write, void buffer, size_t lenp, loff_t ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void , size_t , loff_t ); @r4@ identifier func, ctl; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int , void , size_t , loff_t ); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void buffer, size_t lenp, loff_t ppos); ``` Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Co-developed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-07-19	random: introduce generic vDSO getrandom() implementation	Jason A. Donenfeld
	Provide a generic C vDSO getrandom() implementation, which operates on an opaque state returned by vgetrandom_alloc() and produces random bytes the same way as getrandom(). This has the following API signature: ssize_t vgetrandom(void buffer, size_t len, unsigned int flags, void opaque_state, size_t opaque_len); The return value and the first three arguments are the same as ordinary getrandom(), while the last two arguments are a pointer to the opaque allocated state and its size. Were all five arguments passed to the getrandom() syscall, nothing different would happen, and the functions would have the exact same behavior. The actual vDSO RNG algorithm implemented is the same one implemented by drivers/char/random.c, using the same fast-erasure techniques as that. Should the in-kernel implementation change, so too will the vDSO one. It requires an implementation of ChaCha20 that does not use any stack, in order to maintain forward secrecy if a multi-threaded program forks (though this does not account for a similar issue with SA_SIGINFO copying registers to the stack), so this is left as an architecture-specific fill-in. Stack-less ChaCha20 is an easy algorithm to implement on a variety of architectures, so this shouldn't be too onerous. Initially, the state is keyless, and so the first call makes a getrandom() syscall to generate that key, and then uses it for subsequent calls. By keeping track of a generation counter, it knows when its key is invalidated and it should fetch a new one using the syscall. Later, more than just a generation counter might be used. Since MADV_WIPEONFORK is set on the opaque state, the key and related state is wiped during a fork(), so secrets don't roll over into new processes, and the same state doesn't accidentally generate the same random stream. The generation counter, as well, is always >0, so that the 0 counter is a useful indication of a fork() or otherwise uninitialized state. If the kernel RNG is not yet initialized, then the vDSO always calls the syscall, because that behavior cannot be emulated in userspace, but fortunately that state is short lived and only during early boot. If it has been initialized, then there is no need to inspect the `flags` argument, because the behavior does not change post-initialization regardless of the `flags` value. Since the opaque state passed to it is mutated, vDSO getrandom() is not reentrant, when used with the same opaque state, which libc should be mindful of. The function works over an opaque per-thread state of a particular size, which must be marked VM_WIPEONFORK, VM_DONTDUMP, VM_NORESERVE, and VM_DROPPABLE for proper operation. Over time, the nuances of these allocations may change or grow or even differ based on architectural features. The opaque state passed to vDSO getrandom() must be allocated using the mmap_flags and mmap_prot parameters provided by the vgetrandom_opaque_params struct, which also contains the size of each state. That struct can be obtained with a call to vgetrandom(NULL, 0, 0, &params, ~0UL). Then, libc can call mmap(2) and slice up the returned array into a state per each thread, while ensuring that no single state straddles a page boundary. Libc is expected to allocate a chunk of these on first use, and then dole them out to threads as they're created, allocating more when needed. vDSO getrandom() provides the ability for userspace to generate random bytes quickly and safely, and is intended to be integrated into libc's thread management. As an illustrative example, the introduced code in the vdso_test_getrandom self test later in this series might be used to do the same outside of libc. In a libc the various pthread-isms are expected to be elided into libc internals. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-04-17	random: handle creditable entropy from atomic process context	Jason A. Donenfeld
	The entropy accounting changes a static key when the RNG has initialized, since it only ever initializes once. Static key changes, however, cannot be made from atomic context, so depending on where the last creditable entropy comes from, the static key change might need to be deferred to a worker. Previously the code used the execute_in_process_context() helper function, which accounts for whether or not the caller is in_interrupt(). However, that doesn't account for the case where the caller is actually in process context but is holding a spinlock. This turned out to be the case with input_handle_event() in drivers/input/input.c contributing entropy: [<ffffffd613025ba0>] die+0xa8/0x2fc [<ffffffd613027428>] bug_handler+0x44/0xec [<ffffffd613016964>] brk_handler+0x90/0x144 [<ffffffd613041e58>] do_debug_exception+0xa0/0x148 [<ffffffd61400c208>] el1_dbg+0x60/0x7c [<ffffffd61400c000>] el1h_64_sync_handler+0x38/0x90 [<ffffffd613011294>] el1h_64_sync+0x64/0x6c [<ffffffd613102d88>] __might_resched+0x1fc/0x2e8 [<ffffffd613102b54>] __might_sleep+0x44/0x7c [<ffffffd6130b6eac>] cpus_read_lock+0x1c/0xec [<ffffffd6132c2820>] static_key_enable+0x14/0x38 [<ffffffd61400ac08>] crng_set_ready+0x14/0x28 [<ffffffd6130df4dc>] execute_in_process_context+0xb8/0xf8 [<ffffffd61400ab30>] _credit_init_bits+0x118/0x1dc [<ffffffd6138580c8>] add_timer_randomness+0x264/0x270 [<ffffffd613857e54>] add_input_randomness+0x38/0x48 [<ffffffd613a80f94>] input_handle_event+0x2b8/0x490 [<ffffffd613a81310>] input_event+0x6c/0x98 According to Guoyong, it's not really possible to refactor the various drivers to never hold a spinlock there. And in_atomic() isn't reliable. So, rather than trying to be too fancy, just punt the change in the static key to a workqueue always. There's basically no drawback of doing this, as the code already needed to account for the static key not changing immediately, and given that it's just an optimization, there's not exactly a hurry to change the static key right away, so deferal is fine. Reported-by: Guoyong Wang <guoyong.wang@mediatek.com> Cc: stable@vger.kernel.org Fixes: f5bda35fba61 ("random: use static branch for crng_ready()") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-12-05	iov_iter: replace import_single_range() with import_ubuf()	Jens Axboe
	With the removal of the 'iov' argument to import_single_range(), the two functions are now fully identical. Convert the import_single_range() callers to import_ubuf(), and remove the former fully. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-3-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-05	iov_iter: remove unused 'iov' argument from import_single_range()	Jens Axboe
	It is entirely unused, just get rid of it. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-2-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-11	char-misc: Remove the now superfluous sentinel element from ctl_table array	Joel Granados
	This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel from impi_table and random_table Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-24	tty, proc, kernfs, random: Use copy_splice_read()	David Howells
	Use copy_splice_read() for tty, procfs, kernfs and random files rather than going through generic_file_splice_read() as they just copy the file into the output buffer and don't splice pages. This avoids the need for them to have a ->read_folio() to satisfy filemap_splice_read(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Miklos Szeredi <miklos@szeredi.hu> cc: Arnd Bergmann <arnd@arndb.de> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20230522135018.2742245-13-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-06	cpumask: fix incorrect cpumask scanning result checks	Linus Torvalds
	It turns out that commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations") exposed a number of cases of drivers not checking the result of "cpumask_next()" and friends correctly. The documented correct check for "no more cpus in the cpumask" is to check for the result being equal or larger than the number of possible CPU ids, exactly _because_ we've always done those constant-sized cpumask scans using a widened type before. So the return value of a cpumask scan should be checked with if (cpu >= nr_cpu_ids) ... because the cpumask scan did not necessarily stop exactly at that maximum CPU id. But a few cases ended up instead using checks like if (cpu == nr_cpumask_bits) ... which used that internal "widened" number of bits. And that used to work pretty much by accident (ok, in this case "by accident" is simply because it matched the historical internal implementation of the cpumask scanning, so it was more of a "intentionally using implementation details rather than an accident"). But the extended constant-sized optimizations then did that internal implementation differently, and now that code that did things wrong but matched the old implementation no longer worked at all. Which then causes subsequent odd problems due to using what ends up being an invalid CPU ID. Most of these cases require either unusual hardware or special uses to hit, but the random.c one triggers quite easily. All you really need is to have a sufficiently small CONFIG_NR_CPUS value for the bit scanning optimization to be triggered, but not enough CPUs to then actually fill that widened cpumask. At that point, the cpumask scanning will return the NR_CPUS constant, which is _not_ the same as nr_cpumask_bits. This just does the mindless fix with sed -i 's/== nr_cpumask_bits/>= nr_cpu_ids/' to fix the incorrect uses. The ones in the SCSI lpfc driver in particular could probably be fixed more cleanly by just removing that repeated pattern entirely, but I am not emptionally invested enough in that driver to care. Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/lkml/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/ Reported-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/lkml/CAMuHMdUKo_Sf7TjKzcNDa8Ve+6QrK+P8nSQrSQ=6LTRmcBKNww@mail.gmail.com/ Reported-by: Vernon Yang <vernon2gm@gmail.com> Link: https://lore.kernel.org/lkml/20230306160651.2016767-1-vernon2gm@gmail.com/ Cc: Yury Norov <yury.norov@gmail.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-20	random: do not include <asm/archrandom.h> from random.h	Jason A. Donenfeld
	The <asm/archrandom.h> header is a random.c private detail, not something to be called by other code. As such, don't make it automatically available by way of random.h. Cc: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-12-12	Merge tag 'pull-iov_iter' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "iov_iter work; most of that is about getting rid of direction misannotations and (hopefully) preventing more of the same for the future" * tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: use less confusing names for iov_iter direction initializers iov_iter: saner checks for attempt to copy to/from iterator [xen] fix "direction" argument of iov_iter_kvec() [vhost] fix 'direction' argument of iov_iter_{init,bvec}() [target] fix iov_iter_bvec() "direction" argument [s390] memcpy_real(): WRITE is "data source", not destination... [s390] zcore: WRITE is "data source", not destination... [infiniband] READ is "data destination", not source... [fsi] WRITE is "data source", not destination... [s390] copy_oldmem_kernel() - WRITE is "data source", not destination csum_and_copy_to_iter(): handle ITER_DISCARD get rid of unlikely() on page_copy_sane() calls
2022-12-04	random: align entropy_timer_state to cache line	Jason A. Donenfeld
	The theory behind the jitter dance is that multiple things are poking at the same cache line. This only works, however, if what's being poked at is actually all in the same cache line. Ensure this is the case by aligning the struct on the stack to the cache line size. We can't use ____cacheline_aligned on a stack variable, because gcc assumes 16 byte alignment when only 8 byte alignment is provided by the kernel, which means gcc could technically do something pathological like `(rsp & ~48) - 64`. It doesn't, but rather than risk it, just do the stack alignment manually with PTR_ALIGN and an oversized buffer. Fixes: 50ee7529ec45 ("random: try to actively add entropy rather than passively wait for it") Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-12-04	random: mix in cycle counter when jitter timer fires	Jason A. Donenfeld
	Rather than just relying on interaction between cache lines of the timer and the main loop, also explicitly take into account the fact that the timer might fire at some time that's hard to predict, due to scheduling, interrupts, or cross-CPU conditions. Mix in a cycle counter during the firing of the timer, in addition to the existing one during the scheduling of the timer. It can't hurt and can only help. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-12-04	random: spread out jitter callback to different CPUs	Jason A. Donenfeld
	Rather than merely hoping that the callback gets called on another CPU, arrange for that to actually happen, by round robining which CPU the timer fires on. This way, on multiprocessor machines, we exacerbate jitter by touching the same memory from multiple different cores. There's a little bit of tricky bookkeeping involved here, because using timer_setup_on_stack() + add_timer_on() + del_timer_sync() will result in a use after free. See this sample code: <https://xn--4db.cc/xBdEiIKO/c>. Instead, it's necessary to call [try_to_]del_timer_sync() before calling add_timer_on(), so that the final call to del_timer_sync() at the end of the function actually succeeds at making sure no handlers are running. Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-29	random: remove extraneous period and add a missing one in comments	Jason A. Donenfeld
	Just some trivial typo fixes, and reflowing of lines. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-25	use less confusing names for iov_iter direction initializers	Al Viro
	READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-22	random: add back async readiness notifier	Jason A. Donenfeld
	This is required by vsprint, because it can't do things synchronously from hardirq context, and it will be useful for an EFI notifier as well. I didn't initially want to do this, but with two potential consumers now, it seems worth it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	random: reseed in delayed work rather than on-demand	Jason A. Donenfeld
	Currently, we reseed when random bytes are requested, if the current seed is too old. Since random bytes can be requested from all contexts, including hard IRQ, this means sometimes we wind up adding a bit of latency to hard IRQ. This was so much of a problem on s390x that now s390x just doesn't provide its architectural RNG from hard IRQ context, so we miss out in that case. Instead, let's just schedule a persistent delayed work, so that the reseeding and potentially expensive operations will always happen from process context, reducing unexpected latencies from hard IRQ. This also has the nice effect of accumulating a transcript of random inputs over time, since it means that we amass more input values. And it should make future vDSO integration a bit easier. Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Juergen Christ <jchrist@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	hw_random: use add_hwgenerator_randomness() for early entropy	Jason A. Donenfeld
	Rather than calling add_device_randomness(), the add_early_randomness() function should use add_hwgenerator_randomness(), so that the early entropy can be potentially credited, which allows for the RNG to initialize earlier without having to wait for the kthread to come up. This requires some minor API refactoring, by adding a `sleep_after` parameter to add_hwgenerator_randomness(), so that we don't hit a blocking sleep from add_early_randomness(). Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	random: modernize documentation comment on get_random_bytes()	Jason A. Donenfeld
	The prior text was very old and made outdated references to TCP sequence numbers, which should use one of the integer functions instead, since batched entropy was introduced. The current way of describing the quality of functions is just to say that it's as good as /dev/urandom, which now all the functions are. Fixes: f5b98461cb81 ("random: use chacha20 for get_random_int/long") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	random: adjust comment to account for removed function	Jason A. Donenfeld
	Since de492c83cae0 ("prandom: remove unused functions"), get_random_int() no longer exists, so remove its reference from this comment. Fixes: de492c83cae0 ("prandom: remove unused functions") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	random: remove early archrandom abstraction	Jason A. Donenfeld
	The arch_get_random_early() abstraction is not completely useful and adds complexity, because it's not a given that there will be no calls to arch_get_random() between random_init_early(), which uses arch_get_random_early(), and init_cpu_features(). During that gap, crng_reseed() might be called, which uses arch_get_random(), since it's mostly not init code. Instead we can test whether we're in the early phase in arch_get_random() itself, and in doing so avoid all ambiguity about where we are. Fortunately, the only architecture that currently implements arch_get_random_early() also has an alternatives-based cpu feature system, one flag of which determines whether the other flags have been initialized. This makes it possible to do the early check with zero cost once the system is initialized. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	random: use random.trust_{bootloader,cpu} command line option only	Jason A. Donenfeld
	It's very unusual to have both a command line option and a compile time option, and apparently that's confusing to people. Also, basically everybody enables the compile time option now, which means people who want to disable this wind up having to use the command line option to ensure that anyway. So just reduce the number of moving pieces and nix the compile time option in favor of the more versatile command line option. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	random: add helpers for random numbers with given floor or range	Jason A. Donenfeld
	Now that we have get_random_u32_below(), it's nearly trivial to make inline helpers to compute get_random_u32_above() and get_random_u32_inclusive(), which will help clean up open coded loops and manual computations throughout the tree. One snag is that in order to make get_random_u32_inclusive() operate on closed intervals, we have to do some (unlikely) special case handling if get_random_u32_inclusive(0, U32_MAX) is called. The least expensive way of doing this is actually to adjust the slowpath of get_random_u32_below() to have its undefined 0 result just return the output of get_random_u32(). We can make this basically free by calling get_random_u32() before the branch, so that the branch latency gets interleaved. Cc: stable@vger.kernel.org # to ease future backports that use this api Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-17	random: use rejection sampling for uniform bounded random integers	Jason A. Donenfeld
	Until the very recent commits, many bounded random integers were calculated using `get_random_u32() % max_plus_one`, which not only incurs the price of a division -- indicating performance mostly was not a real issue -- but also does not result in a uniformly distributed output if max_plus_one is not a power of two. Recent commits moved to using `prandom_u32_max(max_plus_one)`, which replaces the division with a faster multiplication, but still does not solve the issue with non-uniform output. For some users, maybe this isn't a problem, and for others, maybe it is, but for the majority of users, probably the question has never been posed and analyzed, and nobody thought much about it, probably assuming random is random is random. In other words, the unthinking expectation of most users is likely that the resultant numbers are uniform. So we implement here an efficient way of generating uniform bounded random integers. Through use of compile-time evaluation, and avoiding divisions as much as possible, this commit introduces no measurable overhead. At least for hot-path uses tested, any potential difference was lost in the noise. On both clang and gcc, code generation is pretty small. The new function, get_random_u32_below(), lives in random.h, rather than prandom.h, and has a "get_random_xxx" function name, because it is suitable for all uses, including cryptography. In order to be efficient, we implement a kernel-specific variant of Daniel Lemire's algorithm from "Fast Random Integer Generation in an Interval", linked below. The kernel's variant takes advantage of constant folding to avoid divisions entirely in the vast majority of cases, works on both 32-bit and 64-bit architectures, and requests a minimal amount of bytes from the RNG. Link: https://arxiv.org/pdf/1805.10941.pdf Cc: stable@vger.kernel.org # to ease future backports that use this api Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-29	random: use arch_get_random*_early() in random_init()	Jean-Philippe Brucker
	While reworking the archrandom handling, commit d349ab99eec7 ("random: handle archrandom with multiple longs") switched to the non-early archrandom helpers in random_init(), which broke initialization of the entropy pool from the arm64 random generator. Indeed at that point the arm64 CPU features, which verify that all CPUs have compatible capabilities, are not finalized so arch_get_random_seed_longs() is unsuccessful. Instead random_init() should use the _early functions, which check only the boot CPU on arm64. On other architectures the _early functions directly call the normal ones. Fixes: d349ab99eec7 ("random: handle archrandom with multiple longs") Cc: stable@vger.kernel.org Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11	prandom: remove unused functions	Jason A. Donenfeld
	With no callers left of prandom_u32() and prandom_bytes(), as well as get_random_int(), remove these deprecated wrappers, in favor of get_random_u32() and get_random_bytes(). Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-06	random: clear new batches when bringing new CPUs online	Jason A. Donenfeld
	The commit that added the new get_random_{u8,u16}() functions neglected to update the code that clears the batches when bringing up a new CPU. It also forgot a few comments and helper defines, so add those in too. Fixes: 585cd5fe9f73 ("random: add 8-bit and 16-bit batches") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-01	random: fix typos in get_random_bytes() comment	William Zijl
	Remove extra whitespace and add a missing word to a sentence describing get_random_bytes(). Signed-off-by: William Zijl <postmaster@gusted.xyz> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-01	random: schedule jitter credit for next jiffy, not in two jiffies	Jason A. Donenfeld
	Counterintuitively, mod_timer(..., jiffies + 1) will cause the timer to fire not in the next jiffy, but in two jiffies. The way to cause the timer to fire in the next jiffy is with mod_timer(..., jiffies). Doing so then lets us bump the upper bound back up again. Fixes: 50ee7529ec45 ("random: try to actively add entropy rather than passively wait for it") Fixes: 829d680e82a9 ("random: cap jitter samples per bit to factor of HZ") Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-29	random: add 8-bit and 16-bit batches	Jason A. Donenfeld
	There are numerous places in the kernel that would be sped up by having smaller batches. Currently those callsites do `get_random_u32() & 0xff` or similar. Since these are pretty spread out, and will require patches to multiple different trees, let's get ahead of the curve and lay the foundation for `get_random_u8()` and `get_random_u16()`, so that it's then possible to start submitting conversion patches leisurely. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-29	random: use init_utsname() instead of utsname()	Jason A. Donenfeld
	Rather than going through the current-> indirection for utsname, at this point in boot, init_utsname()==utsname(), so just use it directly that way. Additionally, init_utsname() appears to be available nearly always, so move it into random_init_early(). Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-29	random: split initialization into early step and later step	Jason A. Donenfeld
	The full RNG initialization relies on some timestamps, made possible with initialization functions like time_init() and timekeeping_init(). However, these are only available rather late in initialization. Meanwhile, other things, such as memory allocator functions, make use of the RNG much earlier. So split RNG initialization into two phases. We can provide arch randomness very early on, and then later, after timekeeping and such are available, initialize the rest. This ensures that, for example, slabs are properly randomized if RDRAND is available. Without this, CONFIG_SLAB_FREELIST_RANDOM=y loses a degree of its security, because its random seed is potentially deterministic, since it hasn't yet incorporated RDRAND. It also makes it possible to use a better seed in kfence, which currently relies on only the cycle counter. Another positive consequence is that on systems with RDRAND, running with CONFIG_WARN_ALL_UNSEEDED_RANDOM=y results in no warnings at all. One subtle side effect of this change is that on systems with no RDRAND, RDTSC is now only queried by random_init() once, committing the moment of the function call, instead of multiple times as before. This is intentional, as the multiple RDTSCs in a loop before weren't accomplishing very much, with jitter being better provided by try_to_generate_entropy(). Plus, filling blocks with RDTSC is still being done in extract_entropy(), which is necessarily called before random bytes are served anyway. Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-28	random: use expired timer rather than wq for mixing fast pool	Jason A. Donenfeld
	Previously, the fast pool was dumped into the main pool periodically in the fast pool's hard IRQ handler. This worked fine and there weren't problems with it, until RT came around. Since RT converts spinlocks into sleeping locks, problems cropped up. Rather than switching to raw spinlocks, the RT developers preferred we make the transformation from originally doing: do_some_stuff() spin_lock() do_some_other_stuff() spin_unlock() to doing: do_some_stuff() queue_work_on(some_other_stuff_worker) This is an ordinary pattern done all over the kernel. However, Sherry noticed a 10% performance regression in qperf TCP over a 40gbps InfiniBand card. Quoting her message: > MT27500 Family [ConnectX-3] cards: > Infiniband device 'mlx4_0' port 1 status: > default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1 > base lid: 0x6 > sm lid: 0x1 > state: 4: ACTIVE > phys state: 5: LinkUp > rate: 40 Gb/sec (4X QDR) > link_layer: InfiniBand > > Cards are configured with IP addresses on private subnet for IPoIB > performance testing. > Regression identified in this bug is in TCP latency in this stack as reported > by qperf tcp_lat metric: > > We have one system listen as a qperf server: > [root@yourQperfServer ~]# qperf > > Have the other system connect to qperf server as a client (in this > case, it’s X7 server with Mellanox card): > [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat Rather than incur the scheduling latency from queue_work_on, we can instead switch to running on the next timer tick, on the same core. This also batches things a bit more -- once per jiffy -- which is okay now that mix_interrupt_randomness() can credit multiple bits at once. Reported-by: Sherry Yang <sherry.yang@oracle.com> Tested-by: Paul Webb <paul.x.webb@oracle.com> Cc: Sherry Yang <sherry.yang@oracle.com> Cc: Phillip Goerl <phillip.goerl@oracle.com> Cc: Jack Vogel <jack.vogel@oracle.com> Cc: Nicky Veitch <nicky.veitch@oracle.com> Cc: Colm Harrington <colm.harrington@oracle.com> Cc: Ramanan Govindarajan <ramanan.govindarajan@oracle.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Tejun Heo <tj@kernel.org> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: stable@vger.kernel.org Fixes: 58340f8e952b ("random: defer fast pool mixing to worker") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-28	random: avoid reading two cache lines on irq randomness	Jason A. Donenfeld
	In order to avoid reading and dirtying two cache lines on every IRQ, move the work_struct to the bottom of the fast_pool struct. add_ interrupt_randomness() always touches .pool and .count, which are currently split, because .mix pushes everything down. Instead, move .mix to the bottom, so that .pool and .count are always in the first cache line, since .mix is only accessed when the pool is full. Fixes: 58340f8e952b ("random: defer fast pool mixing to worker") Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-23	random: clamp credited irq bits to maximum mixed	Jason A. Donenfeld
	Since the most that's mixed into the pool is sizeof(long)*2, don't credit more than that many bytes of entropy. Fixes: e3e33fc2ea7f ("random: do not use input pool from hard IRQs") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-23	random: throttle hwrng writes if no entropy is credited	Jason A. Donenfeld
	If a hwrng source does not provide an entropy estimate, it currently does not contribute at all to the CRNG. In order to help fix this, in case add_hwgenerator_randomness() is called with the entropy parameter set to zero, go to sleep until one reseed interval has passed. While the hwrng thread currently only runs under conditions where this is non-zero, this change is not harmful and prepares for future updates to the hwrng core. Cc: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-23	random: use hwgenerator randomness more frequently at early boot	Dominik Brodowski
	Mix in randomness from hw-rng sources more frequently during early boot, approximately once for every rng reseed. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-23	random: restore O_NONBLOCK support	Jason A. Donenfeld
	Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would return -EAGAIN if there was no entropy. When the pools were unified in 5.6, this was lost. The post 5.6 behavior of blocking until the pool is initialized, and ignoring O_NONBLOCK in the process, went unnoticed, with no reports about the regression received for two and a half years. However, eventually this indeed did break somebody's userspace. So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again. In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar. Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom") Reported-by: Guozihua <guozihua@huawei.com> Reported-by: Zhongguohua <zhongguohua1@huawei.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andrew Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>