summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2024-11-01x86: support user address masking instead of non-speculative conditionalLinus Torvalds
commit 2865baf54077aa98fcdb478cefe6a42c417b9374 upstream. The Spectre-v1 mitigations made "access_ok()" much more expensive, since it has to serialize execution with the test for a valid user address. All the normal user copy routines avoid this by just masking the user address with a data-dependent mask instead, but the fast "unsafe_user_read()" kind of patterms that were supposed to be a fast case got slowed down. This introduces a notion of using src = masked_user_access_begin(src); to do the user address sanity using a data-dependent mask instead of the more traditional conditional if (user_read_access_begin(src, len)) { model. This model only works for dense accesses that start at 'src' and on architectures that have a guard region that is guaranteed to fault in between the user space and the kernel space area. With this, the user access doesn't need to be manually checked, because a bad address is guaranteed to fault (by some architecture masking trick: on x86-64 this involves just turning an invalid user address into all ones, since we don't map the top of address space). This only converts a couple of examples for now. Example x86-64 code generation for loading two words from user space: stac mov %rax,%rcx sar $0x3f,%rcx or %rax,%rcx mov (%rcx),%r13 mov 0x8(%rcx),%r14 clac where all the error handling and -EFAULT is now purely handled out of line by the exception path. Of course, if the micro-architecture does badly at 'clac' and 'stac', the above is still pitifully slow. But at least we did as well as we could. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-01objpool: fix choosing allocation for percpu slotsViktor Malik
[ Upstream commit aff1871bfc81e9dffa7d2a77e67cc5441cc37f81 ] objpool intends to use vmalloc for default (non-atomic) allocations of percpu slots and objects. However, the condition checking if GFP flags set any bit of GFP_ATOMIC is wrong b/c GFP_ATOMIC is a combination of bits (__GFP_HIGH|__GFP_KSWAPD_RECLAIM) and so `pool->gfp & GFP_ATOMIC` will be true if either bit is set. Since GFP_ATOMIC and GFP_KERNEL share the ___GFP_KSWAPD_RECLAIM bit, kmalloc will be used in cases when GFP_KERNEL is specified, i.e. in all current usages of objpool. This may lead to unexpected OOM errors since kmalloc cannot allocate large amounts of memory. For instance, objpool is used by fprobe rethook which in turn is used by BPF kretprobe.multi and kprobe.session probe types. Trying to attach these to all kernel functions with libbpf using SEC("kprobe.session/*") int kprobe(struct pt_regs *ctx) { [...] } fails on objpool slot allocation with ENOMEM. Fix the condition to truly use vmalloc by default. Link: https://lore.kernel.org/all/20240826060718.267261-1-vmalik@redhat.com/ Fixes: b4edb8d2d464 ("lib: objpool added: ring-array based lockless MPMC") Signed-off-by: Viktor Malik <vmalik@redhat.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Matt Wu <wuqiang.matt@bytedance.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-01lib/Kconfig.debug: fix grammar in RUST_BUILD_ASSERT_ALLOWTimo Grautstueck
[ Upstream commit ab8851431bef5cc44f0f3f0da112e883fd4d0df5 ] Just a grammar fix in lib/Kconfig.debug, under the config option RUST_BUILD_ASSERT_ALLOW. Reported-by: Miguel Ojeda <ojeda@kernel.org> Closes: https://github.com/Rust-for-Linux/linux/issues/1006 Fixes: ecaa6ddff2fd ("rust: add `build_error` crate") Signed-off-by: Timo Grautstueck <timo.grautstueck@web.de> Link: https://lore.kernel.org/r/20241006140244.5509-1-timo.grautstueck@web.de Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-22maple_tree: correct tree corruption on spanning storeLorenzo Stoakes
commit bea07fd63192b61209d48cbb81ef474cc3ee4c62 upstream. Patch series "maple_tree: correct tree corruption on spanning store", v3. There has been a nasty yet subtle maple tree corruption bug that appears to have been in existence since the inception of the algorithm. This bug seems far more likely to happen since commit f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point at which reports started to be submitted concerning this bug. We were made definitely aware of the bug thanks to the kind efforts of Bert Karwatzki who helped enormously in my being able to track this down and identify the cause of it. The bug arises when an attempt is made to perform a spanning store across two leaf nodes, where the right leaf node is the rightmost child of the shared parent, AND the store completely consumes the right-mode node. This results in mas_wr_spanning_store() mitakenly duplicating the new and existing entries at the maximum pivot within the range, and thus maple tree corruption. The fix patch corrects this by detecting this scenario and disallowing the mistaken duplicate copy. The fix patch commit message goes into great detail as to how this occurs. This series also includes a test which reliably reproduces the issue, and asserts that the fix works correctly. Bert has kindly tested the fix and confirmed it resolved his issues. Also Mikhail Gavrilov kindly reported what appears to be precisely the same bug, which this fix should also resolve. This patch (of 2): There has been a subtle bug present in the maple tree implementation from its inception. This arises from how stores are performed - when a store occurs, it will overwrite overlapping ranges and adjust the tree as necessary to accommodate this. A range may always ultimately span two leaf nodes. In this instance we walk the two leaf nodes, determine which elements are not overwritten to the left and to the right of the start and end of the ranges respectively and then rebalance the tree to contain these entries and the newly inserted one. This kind of store is dubbed a 'spanning store' and is implemented by mas_wr_spanning_store(). In order to reach this stage, mas_store_gfp() invokes mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to walk the tree and update the object (mas) to traverse to the location where the write should be performed, determining its store type. When a spanning store is required, this function returns false stopping at the parent node which contains the target range, and mas_wr_store_type() marks the mas->store_type as wr_spanning_store to denote this fact. When we go to perform the store in mas_wr_spanning_store(), we first determine the elements AFTER the END of the range we wish to store (that is, to the right of the entry to be inserted) - we do this by walking to the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we have just determined contains the range over which we intend to write. We then turn our attention to the entries to the left of the entry we are inserting, whose state is represented by l_mas, and copy these into a 'big node', which is a special node which contains enough slots to contain two leaf node's worth of data. We then copy the entry we wish to store immediately after this - the copy and the insertion of the new entry is performed by mas_store_b_node(). After this we copy the elements to the right of the end of the range which we are inserting, if we have not exceeded the length of the node (i.e. r_mas.offset <= r_mas.end). Herein lies the bug - under very specific circumstances, this logic can break and corrupt the maple tree. Consider the following tree: Height 0 Root Node / \ pivot = 0xffff / \ pivot = ULONG_MAX / \ 1 A [-----] ... / \ pivot = 0x4fff / \ pivot = 0xffff / \ 2 (LEAVES) B [-----] [-----] C ^--- Last pivot 0xffff. Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note that all ranges expressed in maple tree code are inclusive): 1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then determines that this is a spanning store across nodes B and C. The mas state is set such that the current node from which we traverse further is node A. 2. In mas_wr_spanning_store() we try to find elements to the right of pivot 0xffff by searching for an index of 0x10000: - mas_wr_walk_index() invokes mas_wr_walk_descend() and mas_wr_node_walk() in turn. - mas_wr_node_walk() loops over entries in node A until EITHER it finds an entry whose pivot equals or exceeds 0x10000 OR it reaches the final entry. - Since no entry has a pivot equal to or exceeding 0x10000, pivot 0xffff is selected, leading to node C. - mas_wr_walk_traverse() resets the mas state to traverse node C. We loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk() in turn once again. - Again, we reach the last entry in node C, which has a pivot of 0xffff. 3. We then copy the elements to the left of 0x4000 in node B to the big node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry too. 4. We determine whether we have any entries to copy from the right of the end of the range via - and with r_mas set up at the entry at pivot 0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at pivot 0xffff. 5. BUG! The maple tree is corrupted with a duplicate entry. This requires a very specific set of circumstances - we must be spanning the last element in a leaf node, which is the last element in the parent node. spanning store across two leaf nodes with a range that ends at that shared pivot. A potential solution to this problem would simply be to reset the walk each time we traverse r_mas, however given the rarity of this situation it seems that would be rather inefficient. Instead, this patch detects if the right hand node is populated, i.e. has anything we need to copy. We do so by only copying elements from the right of the entry being inserted when the maximum value present exceeds the last, rather than basing this on offset position. The patch also updates some comments and eliminates the unused bool return value in mas_wr_walk_index(). The work performed in commit f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()") seems to have made the probability of this event much more likely, which is the point at which reports started to be submitted concerning this bug. The motivation for this change arose from Bert Karwatzki's report of encountering mm instability after the release of kernel v6.12-rc1 which, after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration options, was identified as maple tree corruption. After Bert very generously provided his time and ability to reproduce this event consistently, I was able to finally identify that the issue discussed in this commit message was occurring for him. Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Bert Karwatzki <spasswolf@web.de> Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/ Tested-by: Bert Karwatzki <spasswolf@web.de> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/ Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10lib/buildid: harden build ID parsing logicAndrii Nakryiko
commit 905415ff3ffb1d7e5afa62bacabd79776bd24606 upstream. Harden build ID parsing logic, adding explicit READ_ONCE() where it's important to have a consistent value read and validated just once. Also, as pointed out by Andi Kleen, we need to make sure that entire ELF note is within a page bounds, so move the overflow check up and add an extra note_size boundaries validation. Fixes tag below points to the code that moved this code into lib/buildid.c, and then subsequently was used in perf subsystem, making this code exposed to perf_event_open() users in v5.12+. Cc: stable@vger.kernel.org Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Reviewed-by: Jann Horn <jannh@google.com> Suggested-by: Andi Kleen <ak@linux.intel.com> Fixes: bd7525dacd7e ("bpf: Move stack_map_get_build_id into lib") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240829174232.3133883-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04debugobjects: Fix conditions in fill_pool()Zhen Lei
commit 684d28feb8546d1e9597aa363c3bfcf52fe250b7 upstream. fill_pool() uses 'obj_pool_min_free' to decide whether objects should be handed back to the kmem cache. But 'obj_pool_min_free' records the lowest historical value of the number of objects in the object pool and not the minimum number of objects which should be kept in the pool. Use 'debug_objects_pool_min_level' instead, which holds the minimum number which was scaled to the number of CPUs at boot time. [ tglx: Massage change log ] Fixes: d26bf5056fc0 ("debugobjects: Reduce number of pool_lock acquisitions in fill_pool()") Fixes: 36c4ead6f6df ("debugobjects: Add global free list and the counter") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240904133944.2124-3-thunder.leizhen@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04lib/sbitmap: define swap_lock as raw_spinlock_tMing Lei
[ Upstream commit 65f666c6203600053478ce8e34a1db269a8701c9 ] When called from sbitmap_queue_get(), sbitmap_deferred_clear() may be run with preempt disabled. In RT kernel, spin_lock() can sleep, then warning of "BUG: sleeping function called from invalid context" can be triggered. Fix it by replacing it with raw_spin_lock. Cc: Yang Yang <yang.yang@vivo.com> Fixes: 72d04bdcf3f7 ("sbitmap: fix io hung due to race on sbitmap_word::cleared") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Yang Yang <yang.yang@vivo.com> Link: https://lore.kernel.org/r/20240919021709.511329-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04xz: cleanup CRC32 edits from 2018Lasse Collin
[ Upstream commit 2ee96abef214550d9e92f5143ee3ac1fd1323e67 ] In 2018, a dependency on <linux/crc32poly.h> was added to avoid duplicating the same constant in multiple files. Two months later it was found to be a bad idea and the definition of CRC32_POLY_LE macro was moved into xz_private.h to avoid including <linux/crc32poly.h>. xz_private.h is a wrong place for it too. Revert back to the upstream version which has the poly in xz_crc32_init() in xz_crc32.c. Link: https://lkml.kernel.org/r/20240721133633.47721-10-lasse.collin@tukaani.org Fixes: faa16bc404d7 ("lib: Use existing define with polynomial") Fixes: 242cdad873a7 ("lib/xz: Put CRC32_POLY_LE in xz_private.h") Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Reviewed-by: Sam James <sam@gentoo.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joel Stanley <joel@jms.id.au> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jubin Zhong <zhongjubin@huawei.com> Cc: Jules Maselbas <jmaselbas@zdiv.net> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rui Li <me@lirui.org> Cc: Simon Glass <sjg@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-05Merge tag 'linux_kselftest-kunit-fixes-6.11-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit fix fromShuah Khan: "One single fix to a use-after-free bug resulting from kunit_driver_create() failing to copy the driver name leaving it on the stack or freeing it" * tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Device wrappers should also manage driver name
2024-09-01alloc_tag: fix allocation tag reporting when CONFIG_MODULES=nSuren Baghdasaryan
codetag_module_init() is used to initialize sections containing allocation tags. This function is used to initialize module sections as well as core kernel sections, in which case the module parameter is set to NULL. This function has to be called even when CONFIG_MODULES=n to initialize core kernel allocation tag sections. When CONFIG_MODULES=n, this function is a NOP, which is wrong. This leads to /proc/allocinfo reported as empty. Fix this by making it independent of CONFIG_MODULES. Link: https://lkml.kernel.org/r/20240828231536.1770519-1-surenb@google.com Fixes: 916cc5167cc6 ("lib: code tagging framework") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [6.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01maple_tree: remove rcu_read_lock() from mt_validate()Liam R. Howlett
The write lock should be held when validating the tree to avoid updates racing with checks. Holding the rcu read lock during a large tree validation may also cause a prolonged rcu read window and "rcu_preempt detected stalls" warnings. Link: https://lore.kernel.org/all/0000000000001d12d4062005aea1@google.com/ Link: https://lkml.kernel.org/r/20240820175417.2782532-1-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reported-by: syzbot+036af2f0c7338a33b0cd@syzkaller.appspotmail.com Cc: Hillf Danton <hdanton@sina.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-08-29Merge tag 'random-6.11-rc6-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator fix from Jason Donenfeld: "Reject invalid flags passed to vgetrandom() in the same way that getrandom() does, so that the behavior is the same, from Yann. The flags argument to getrandom() only has a behavioral effect on the function if the RNG isn't initialized yet, so vgetrandom() falls back to the syscall in that case. But if the RNG is initialized, all of the flags behave the same way, so vgetrandom() didn't bother checking them, and just ignored them entirely. But that doesn't account for invalid flags passed in, which need to be rejected so we can use them later" * tag 'random-6.11-rc6-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: random: vDSO: reject unknown getrandom() flags
2024-08-26kunit: Device wrappers should also manage driver nameDavid Gow
kunit_driver_create() accepts a name for the driver, but does not copy it, so if that name is either on the stack, or otherwise freed, we end up with a use-after-free when the driver is cleaned up. Instead, strdup() the name, and manage it as another KUnit allocation. As there was no existing kunit_kstrdup(), we add one. Further, add a kunit_ variant of strdup_const() and kfree_const(), so we don't need to allocate and manage the string in the majority of cases where it's a constant. However, these are inline functions, and is_kernel_rodata() only works for built-in code. This causes problems in two cases: - If kunit is built as a module, __{start,end}_rodata is not defined. - If a kunit test using these functions is built as a module, it will suffer the same fate. This fixes a KASAN splat with overflow.overflow_allocation_test, when built as a module. Restrict the is_kernel_rodata() case to when KUnit is built as a module, which fixes the first case, at the cost of losing the optimisation. Also, make kunit_{kstrdup,kfree}_const non-inline, so that other modules using them will not accidentally depend on is_kernel_rodata(). If KUnit is built-in, they'll benefit from the optimisation, if KUnit is not, they won't, but the string will be properly duplicated. Fixes: d03c720e03bd ("kunit: Add APIs for managing devices") Reported-by: Nico Pache <npache@redhat.com> Closes: https://groups.google.com/g/kunit-dev/c/81V9b9QYON0 Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Rae Moar <rmoar@google.com> Signed-off-by: David Gow <davidgow@google.com> Tested-by: Rae Moar <rmoar@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-08-26random: vDSO: reject unknown getrandom() flagsYann Droneaud
Like the getrandom() syscall, vDSO getrandom() must also reject unknown flags. [1] It would be possible to return -EINVAL from vDSO itself, but in the possible case that a new flag is added to getrandom() syscall in the future, it would be easier to get the behavior from the syscall, instead of erroring until the vDSO is extended to support the new flag or explicitly falling back. [1] Designing the API: Planning for Extension https://docs.kernel.org/process/adding-syscalls.html#designing-the-api-planning-for-extension Signed-off-by: Yann Droneaud <yann@droneaud.fr> [Jason: reworded commit message] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-08-17Merge tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent OverstreetL - New on disk format version, bcachefs_metadata_version_disk_accounting_inum This adds one more disk accounting counter, which counts disk usage and number of extents per inode number. This lets us track fragmentation, for implementing defragmentation later, and it also counts disk usage per inode in all snapshots, which will be a useful thing to expose to users. - One performance issue we've observed is threads spinning when they should be waiting for dirty keys in the key cache to be flushed by journal reclaim, so we now have hysteresis for the waiting thread, as well as improving the tracepoint and a new time_stat, for tracking time blocked waiting on key cache flushing. ... and various assorted smaller fixes. * tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefs: bcachefs: Fix locking in __bch2_trans_mark_dev_sb() bcachefs: fix incorrect i_state usage bcachefs: avoid overflowing LRU_TIME_BITS for cached data lru bcachefs: Fix forgetting to pass trans to fsck_err() bcachefs: Increase size of cuckoo hash table on too many rehashes bcachefs: bcachefs_metadata_version_disk_accounting_inum bcachefs: Kill __bch2_accounting_mem_mod() bcachefs: Make bkey_fsck_err() a wrapper around fsck_err() bcachefs: Fix warning in __bch2_fsck_err() for trans not passed in bcachefs: Add a time_stat for blocked on key cache flush bcachefs: Improve trans_blocked_journal_reclaim tracepoint bcachefs: Add hysteresis to waiting on btree key cache flush lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc() bcachefs: Convert for_each_btree_node() to lockrestart_do() bcachefs: Add missing downgrade table entry bcachefs: disk accounting: ignore unknown types bcachefs: bch2_accounting_invalid() fixup bcachefs: Fix bch2_trigger_alloc when upgrading from old versions bcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()
2024-08-15kunit/overflow: Fix UB in overflow_allocation_testIvan Orlov
The 'device_name' array doesn't exist out of the 'overflow_allocation_test' function scope. However, it is being used as a driver name when calling 'kunit_driver_create' from 'kunit_device_register'. It produces the kernel panic with KASAN enabled. Since this variable is used in one place only, remove it and pass the device name into kunit_device_register directly as an ascii string. Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Reviewed-by: David Gow <davidgow@google.com> Link: https://lore.kernel.org/r/20240815000431.401869-1-ivan.orlov0322@gmail.com Signed-off-by: Kees Cook <kees@kernel.org>
2024-08-13lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc()Kent Overstreet
If we need to increase the tree depth, allocate a new node, and then race with another thread that increased the tree depth before us, we'll still have a preallocated node that might be used later. If we then use that node for a new non-root node, it'll still have a pointer to the old root instead of being zeroed - fix this by zeroing it in the cmpxchg failure path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-28minmax: don't use max() in situations that want a C constant expressionLinus Torvalds
We only had a couple of array[] declarations, and changing them to just use 'MAX()' instead of 'max()' fixes the issue. This will allow us to simplify our min/max macros enormously, since they can now unconditionally use temporary variables to avoid using the argument values multiple times. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-28minmax: make generic MIN() and MAX() macros available everywhereLinus Torvalds
This just standardizes the use of MIN() and MAX() macros, with the very traditional semantics. The goal is to use these for C constant expressions and for top-level / static initializers, and so be able to simplify the min()/max() macros. These macro names were used by various kernel code - they are very traditional, after all - and all such users have been fixed up, with a few different approaches: - trivial duplicated macro definitions have been removed Note that 'trivial' here means that it's obviously kernel code that already included all the major kernel headers, and thus gets the new generic MIN/MAX macros automatically. - non-trivial duplicated macro definitions are guarded with #ifndef This is the "yes, they define their own versions, but no, the include situation is not entirely obvious, and maybe they don't get the generic version automatically" case. - strange use case #1 A couple of drivers decided that the way they want to describe their versioning is with #define MAJ 1 #define MIN 2 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) which adds zero value and I just did my Alexander the Great impersonation, and rewrote that pointless Gordian knot as #define DRV_VERSION "1.2" instead. - strange use case #2 A couple of drivers thought that it's a good idea to have a random 'MIN' or 'MAX' define for a value or index into a table, rather than the traditional macro that takes arguments. These values were re-written as C enum's instead. The new function-line macros only expand when followed by an open parenthesis, and thus don't clash with enum use. Happily, there weren't really all that many of these cases, and a lot of users already had the pattern of using '#ifndef' guarding (or in one case just using '#undef MIN') before defining their own private version that does the same thing. I left such cases alone. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-27Merge tag 'rust-6.11' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull Rust updates from Miguel Ojeda: "The highlight is the establishment of a minimum version for the Rust toolchain, including 'rustc' (and bundled tools) and 'bindgen'. The initial minimum will be the pinned version we currently have, i.e. we are just widening the allowed versions. That covers three stable Rust releases: 1.78.0, 1.79.0, 1.80.0 (getting released tomorrow), plus beta, plus nightly. This should already be enough for kernel developers in distributions that provide recent Rust compiler versions routinely, such as Arch Linux, Debian Unstable (outside the freeze period), Fedora Linux, Gentoo Linux (especially the testing channel), Nix (unstable) and openSUSE Slowroll and Tumbleweed. In addition, the kernel is now being built-tested by Rust's pre-merge CI. That is, every change that is attempting to land into the Rust compiler is tested against the kernel, and it is merged only if it passes. Similarly, the bindgen tool has agreed to build the kernel in their CI too. Thus, with the pre-merge CI in place, both projects hope to avoid unintentional changes to Rust that break the kernel. This means that, in general, apart from intentional changes on their side (that we will need to workaround conditionally on our side), the upcoming Rust compiler versions should generally work. In addition, the Rust project has proposed getting the kernel into stable Rust (at least solving the main blockers) as one of its three flagship goals for 2024H2 [1]. I would like to thank Niko, Sid, Emilio et al. for their help promoting the collaboration between Rust and the kernel. Toolchain and infrastructure: - Support several Rust toolchain versions. - Support several bindgen versions. - Remove 'cargo' requirement and simplify 'rusttest', thanks to 'alloc' having been dropped last cycle. - Provide proper error reporting for the 'rust-analyzer' target. 'kernel' crate: - Add 'uaccess' module with a safe userspace pointers abstraction. - Add 'page' module with a 'struct page' abstraction. - Support more complex generics in workqueue's 'impl_has_work!' macro. 'macros' crate: - Add 'firmware' field support to the 'module!' macro. - Improve 'module!' macro documentation. Documentation: - Provide instructions on what packages should be installed to build the kernel in some popular Linux distributions. - Introduce the new kernel.org LLVM+Rust toolchains. - Explain '#[no_std]'. And a few other small bits" Link: https://rust-lang.github.io/rust-project-goals/2024h2/index.html#flagship-goals [1] * tag 'rust-6.11' of https://github.com/Rust-for-Linux/linux: (26 commits) docs: rust: quick-start: add section on Linux distributions rust: warn about `bindgen` versions 0.66.0 and 0.66.1 rust: start supporting several `bindgen` versions rust: work around `bindgen` 0.69.0 issue rust: avoid assuming a particular `bindgen` build rust: start supporting several compiler versions rust: simplify Clippy warning flags set rust: relax most deny-level lints to warnings rust: allow `dead_code` for never constructed bindings rust: init: simplify from `map_err` to `inspect_err` rust: macros: indent list item in `paste!`'s docs rust: add abstraction for `struct page` rust: uaccess: add typed accessors for userspace pointers uaccess: always export _copy_[from|to]_user with CONFIG_RUST rust: uaccess: add userspace pointers kbuild: rust-analyzer: improve comment documentation kbuild: rust-analyzer: better error handling docs: rust: no_std is used rust: alloc: add __GFP_HIGHMEM flag rust: alloc: fix typo in docs for GFP_NOWAIT ...
2024-07-27Merge tag 'mm-hotfixes-stable-2024-07-26-14-33' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "11 hotfixes, 7 of which are cc:stable. 7 are MM, 4 are other" * tag 'mm-hotfixes-stable-2024-07-26-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nilfs2: handle inconsistent state in nilfs_btnode_create_block() selftests/mm: skip test for non-LPA2 and non-LVA systems mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist() mm: memcg: add cacheline padding after lruvec in mem_cgroup_per_node alloc_tag: outline and export free_reserved_page() decompress_bunzip2: fix rare decompression failure mm/huge_memory: avoid PMD-size page cache if needed mm: huge_memory: use !CONFIG_64BIT to relax huge page alignment on 32 bit machines mm: fix old/young bit handling in the faulting path dt-bindings: arm: update James Clark's email address MAINTAINERS: mailmap: update James Clark's email address
2024-07-26decompress_bunzip2: fix rare decompression failureRoss Lagerwall
The decompression code parses a huffman tree and counts the number of symbols for a given bit length. In rare cases, there may be >= 256 symbols with a given bit length, causing the unsigned char to overflow. This causes a decompression failure later when the code tries and fails to find the bit length for a given symbol. Since the maximum number of symbols is 258, use unsigned short instead. Link: https://lkml.kernel.org/r/20240717162016.1514077-1-ross.lagerwall@citrix.com Fixes: bc22c17e12c1 ("bzip2/lzma: library support for gzip, bzip2 and lzma decompression") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Cc: Alain Knaff <alain@knaff.lu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-26Merge tag 'bitmap-6.11-rc1' of https://github.com:/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: "Random fixes" * tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux: riscv: Remove unnecessary int cast in variable_fls() radix tree test suite: put definition of bitmap_clear() into lib/bitmap.c bitops: Add a comment explaining the double underscore macros lib: bitmap: add missing MODULE_DESCRIPTION() macros cpumask: introduce assign_cpu() macro
2024-07-25Merge tag 'printk-for-6.11-trivial' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - trivial printk changes The bigger "real" printk work is still being discussed. * tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: vsprintf: add missing MODULE_DESCRIPTION() macro printk: Rename console_replay_all() and update context
2024-07-25Merge tag 'driver-core-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...
2024-07-24Merge tag 'random-6.11-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "This adds getrandom() support to the vDSO. First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which lets the kernel zero out pages anytime under memory pressure, which enables allocating memory that never gets swapped to disk but also doesn't count as being mlocked. Then, the vDSO implementation of getrandom() is introduced in a generic manner and hooked into random.c. Next, this is implemented on x86. (Also, though it's not ready for this pull, somebody has begun an arm64 implementation already) Finally, two vDSO selftests are added. There are also two housekeeping cleanup commits" * tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: MAINTAINERS: add random.h headers to RNG subsection random: note that RNDGETPOOL was removed in 2.6.9-rc2 selftests/vDSO: add tests for vgetrandom x86: vdso: Wire up getrandom() vDSO implementation random: introduce generic vDSO getrandom() implementation mm: add MAP_DROPPABLE for designating always lazily freeable mappings
2024-07-22Merge tag 'for-6.11/block-20240722' of git://git.kernel.dk/linuxLinus Torvalds
Pull more block updates from Jens Axboe: - MD fixes via Song: - md-cluster fixes (Heming Zhao) - raid1 fix (Mateusz Jończyk) - s390/dasd module description (Jeff) - Series cleaning up and hardening the blk-mq debugfs flag handling (John, Christoph) - blk-cgroup cleanup (Xiu) - Error polled IO attempts if backend doesn't support it (hexue) - Fix for an sbitmap hang (Yang) * tag 'for-6.11/block-20240722' of git://git.kernel.dk/linux: (23 commits) blk-cgroup: move congestion_count to struct blkcg sbitmap: fix io hung due to race on sbitmap_word::cleared block: avoid polling configuration errors block: Catch possible entries missing from rqf_name[] block: Simplify definition of RQF_NAME() block: Use enum to define RQF_x bit indexes block: Catch possible entries missing from cmd_flag_name[] block: Catch possible entries missing from alloc_policy_name[] block: Catch possible entries missing from hctx_flag_name[] block: Catch possible entries missing from hctx_state_name[] block: Catch possible entries missing from blk_queue_flag_name[] block: Make QUEUE_FLAG_x as an enum block: Relocate BLK_MQ_MAX_DEPTH block: Relocate BLK_MQ_CPU_WORK_BATCH block: remove QUEUE_FLAG_STOPPED block: Add missing entry to hctx_flag_name[] block: Add zone write plugging entry to rqf_name[] block: Add missing entries from cmd_flag_name[] s390/dasd: fix error checks in dasd_copy_pair_store() s390/dasd: add missing MODULE_DESCRIPTION() macros ...
2024-07-21Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - In the series "treewide: Refactor heap related implementation", Kuan-Wei Chiu has significantly reworked the min_heap library code and has taught bcachefs to use the new more generic implementation. - Yury Norov's series "Cleanup cpumask.h inclusion in core headers" reworks the cpumask and nodemask headers to make things generally more rational. - Kuan-Wei Chiu has sent along some maintenance work against our sorting library code in the series "lib/sort: Optimizations and cleanups". - More library maintainance work from Christophe Jaillet in the series "Remove usage of the deprecated ida_simple_xx() API". - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the series "nilfs2: eliminate the call to inode_attach_wb()". - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB command error". - Plus the usual shower of singleton patches all over the place. Please see the relevant changelogs for details. * tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits) ia64: scrub ia64 from poison.h watchdog/perf: properly initialize the turbo mode timestamp and rearm counter tsacct: replace strncpy() with strscpy() lib/bch.c: use swap() to improve code test_bpf: convert comma to semicolon init/modpost: conditionally check section mismatch to __meminit* init: remove unused __MEMINIT* macros nilfs2: Constify struct kobj_type nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro math: rational: add missing MODULE_DESCRIPTION() macro lib/zlib: add missing MODULE_DESCRIPTION() macro fs: ufs: add MODULE_DESCRIPTION() lib/rbtree.c: fix the example typo ocfs2: add bounds checking to ocfs2_check_dir_entry() fs: add kernel-doc comments to ocfs2_prepare_orphan_dir() coredump: simplify zap_process() selftests/fpu: add missing MODULE_DESCRIPTION() macro compiler.h: simplify data_race() macro build-id: require program headers to be right after ELF header resource: add missing MODULE_DESCRIPTION() ...
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-19Merge tag 'char-misc-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc and other driver updates from Greg KH: "Here is the "big" set of char/misc and other driver subsystem changes for 6.11-rc1. Nothing major in here, just loads of new drivers and updates. Included in here are: - IIO api updates and new drivers added - wait_interruptable_timeout() api cleanups for some drivers - MODULE_DESCRIPTION() additions for loads of drivers - parport out-of-bounds fix - interconnect driver updates and additions - mhi driver updates and additions - w1 driver fixes - binder speedups and fixes - eeprom driver updates - coresight driver updates - counter driver update - new misc driver additions - other minor api updates All of these, EXCEPT for the final Kconfig build fix for 32bit systems, have been in linux-next for a while with no reported issues. The Kconfig fixup went in 29 hours ago, so might have missed the latest linux-next, but was acked by everyone involved" * tag 'char-misc-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (330 commits) misc: Kconfig: exclude mrvl-cn10k-dpi compilation for 32-bit systems misc: delete Makefile.rej binder: fix hang of unregistered readers misc: Kconfig: add a new dependency for MARVELL_CN10K_DPI virtio: add missing MODULE_DESCRIPTION() macro agp: uninorth: add missing MODULE_DESCRIPTION() macro spmi: add missing MODULE_DESCRIPTION() macros dev/parport: fix the array out-of-bounds risk samples: configfs: add missing MODULE_DESCRIPTION() macro misc: mrvl-cn10k-dpi: add Octeon CN10K DPI administrative driver misc: keba: Fix missing AUXILIARY_BUS dependency slimbus: Fix struct and documentation alignment in stream.c MAINTAINERS: CC dri-devel list on Qualcomm FastRPC patches misc: fastrpc: use coherent pool for untranslated Compute Banks misc: fastrpc: support complete DMA pool access to the DSP misc: fastrpc: add missing MODULE_DESCRIPTION() macro misc: fastrpc: Add missing dev_err newlines misc: fastrpc: Use memdup_user() nvmem: core: Implement force_ro sysfs attribute nvmem: Use sysfs_emit() for type attribute ...
2024-07-19random: introduce generic vDSO getrandom() implementationJason A. Donenfeld
Provide a generic C vDSO getrandom() implementation, which operates on an opaque state returned by vgetrandom_alloc() and produces random bytes the same way as getrandom(). This has the following API signature: ssize_t vgetrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state, size_t opaque_len); The return value and the first three arguments are the same as ordinary getrandom(), while the last two arguments are a pointer to the opaque allocated state and its size. Were all five arguments passed to the getrandom() syscall, nothing different would happen, and the functions would have the exact same behavior. The actual vDSO RNG algorithm implemented is the same one implemented by drivers/char/random.c, using the same fast-erasure techniques as that. Should the in-kernel implementation change, so too will the vDSO one. It requires an implementation of ChaCha20 that does not use any stack, in order to maintain forward secrecy if a multi-threaded program forks (though this does not account for a similar issue with SA_SIGINFO copying registers to the stack), so this is left as an architecture-specific fill-in. Stack-less ChaCha20 is an easy algorithm to implement on a variety of architectures, so this shouldn't be too onerous. Initially, the state is keyless, and so the first call makes a getrandom() syscall to generate that key, and then uses it for subsequent calls. By keeping track of a generation counter, it knows when its key is invalidated and it should fetch a new one using the syscall. Later, more than just a generation counter might be used. Since MADV_WIPEONFORK is set on the opaque state, the key and related state is wiped during a fork(), so secrets don't roll over into new processes, and the same state doesn't accidentally generate the same random stream. The generation counter, as well, is always >0, so that the 0 counter is a useful indication of a fork() or otherwise uninitialized state. If the kernel RNG is not yet initialized, then the vDSO always calls the syscall, because that behavior cannot be emulated in userspace, but fortunately that state is short lived and only during early boot. If it has been initialized, then there is no need to inspect the `flags` argument, because the behavior does not change post-initialization regardless of the `flags` value. Since the opaque state passed to it is mutated, vDSO getrandom() is not reentrant, when used with the same opaque state, which libc should be mindful of. The function works over an opaque per-thread state of a particular size, which must be marked VM_WIPEONFORK, VM_DONTDUMP, VM_NORESERVE, and VM_DROPPABLE for proper operation. Over time, the nuances of these allocations may change or grow or even differ based on architectural features. The opaque state passed to vDSO getrandom() must be allocated using the mmap_flags and mmap_prot parameters provided by the vgetrandom_opaque_params struct, which also contains the size of each state. That struct can be obtained with a call to vgetrandom(NULL, 0, 0, &params, ~0UL). Then, libc can call mmap(2) and slice up the returned array into a state per each thread, while ensuring that no single state straddles a page boundary. Libc is expected to allocate a chunk of these on first use, and then dole them out to threads as they're created, allocating more when needed. vDSO getrandom() provides the ability for userspace to generate random bytes quickly and safely, and is intended to be integrated into libc's thread management. As an illustrative example, the introduced code in the vdso_test_getrandom self test later in this series might be used to do the same outside of libc. In a libc the various pthread-isms are expected to be elided into libc internals. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19Merge tag 'v6.11-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Test setkey in no-SIMD context - Add skcipher speed test for user-specified algorithm Algorithms: - Add x25519 support on ppc64le - Add VAES and AVX512 / AVX10 optimized AES-GCM on x86 - Remove sm2 algorithm Drivers: - Add Allwinner H616 support to sun8i-ce - Use DMA in stm32 - Add Exynos850 hwrng support to exynos" * tag 'v6.11-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (81 commits) hwrng: core - remove (un)register_miscdev() crypto: lib/mpi - delete unnecessary condition crypto: testmgr - generate power-of-2 lengths more often crypto: mxs-dcp - Ensure payload is zero when using key slot hwrng: Kconfig - Do not enable by default CN10K driver crypto: starfive - Fix nent assignment in rsa dec crypto: starfive - Align rsa input data to 32-bit crypto: qat - fix unintentional re-enabling of error interrupts crypto: qat - extend scope of lock in adf_cfg_add_key_value_param() Documentation: qat: fix auto_reset attribute details crypto: sun8i-ce - add Allwinner H616 support crypto: sun8i-ce - wrap accesses to descriptor address fields dt-bindings: crypto: sun8i-ce: Add compatible for H616 hwrng: core - Fix wrong quality calculation at hw rng registration hwrng: exynos - Enable Exynos850 support hwrng: exynos - Add SMC based TRNG operation hwrng: exynos - Implement bus clock control hwrng: exynos - Use devm_clk_get_enabled() to get the clock hwrng: exynos - Improve coding style dt-bindings: rng: Add Exynos850 support to exynos-trng ...
2024-07-19sbitmap: fix io hung due to race on sbitmap_word::clearedYang Yang
Configuration for sbq: depth=64, wake_batch=6, shift=6, map_nr=1 1. There are 64 requests in progress: map->word = 0xFFFFFFFFFFFFFFFF 2. After all the 64 requests complete, and no more requests come: map->word = 0xFFFFFFFFFFFFFFFF, map->cleared = 0xFFFFFFFFFFFFFFFF 3. Now two tasks try to allocate requests: T1: T2: __blk_mq_get_tag . __sbitmap_queue_get . sbitmap_get . sbitmap_find_bit . sbitmap_find_bit_in_word . __sbitmap_get_word -> nr=-1 __blk_mq_get_tag sbitmap_deferred_clear __sbitmap_queue_get /* map->cleared=0xFFFFFFFFFFFFFFFF */ sbitmap_find_bit if (!READ_ONCE(map->cleared)) sbitmap_find_bit_in_word return false; __sbitmap_get_word -> nr=-1 mask = xchg(&map->cleared, 0) sbitmap_deferred_clear atomic_long_andnot() /* map->cleared=0 */ if (!(map->cleared)) return false; /* * map->cleared is cleared by T1 * T2 fail to acquire the tag */ 4. T2 is the sole tag waiter. When T1 puts the tag, T2 cannot be woken up due to the wake_batch being set at 6. If no more requests come, T1 will wait here indefinitely. This patch achieves two purposes: 1. Check on ->cleared and update on both ->cleared and ->word need to be done atomically, and using spinlock could be the simplest solution. 2. Add extra check in sbitmap_deferred_clear(), to identify whether ->word has free bits. Fixes: ea86ea2cdced ("sbitmap: ammortize cost of clearing bits") Signed-off-by: Yang Yang <yang.yang@vivo.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240716082644.659566-1-yang.yang@vivo.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-18Merge tag 'slab-for-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: "The most prominent change this time is the kmem_buckets based hardening of kmalloc() allocations from Kees Cook. We have also extended the kmalloc() alignment guarantees for non-power-of-two sizes in a way that benefits rust. The rest are various cleanups and non-critical fixups. - Dedicated bucket allocator (Kees Cook) This series [1] enhances the probabilistic defense against heap spraying/grooming of CONFIG_RANDOM_KMALLOC_CACHES from last year. kmalloc() users that are known to be useful for exploits can get completely separate set of kmalloc caches that can't be shared with other users. The first converted users are alloc_msg() and memdup_user(). The hardening is enabled by CONFIG_SLAB_BUCKETS. - Extended kmalloc() alignment guarantees (Vlastimil Babka) For years now we have guaranteed natural alignment for power-of-two allocations, but nothing was defined for other sizes (in practice, we have two such buckets, kmalloc-96 and kmalloc-192). To avoid unnecessary padding in the rust layer due to its alignment rules, extend the guarantee so that the alignment is at least the largest power-of-two divisor of the requested size. This fits what rust needs, is a superset of the existing power-of-two guarantee, and does not in practice change the layout (and thus does not add overhead due to padding) of the kmalloc-96 and kmalloc-192 caches, unless slab debugging is enabled for them. - Cleanups and non-critical fixups (Chengming Zhou, Suren Baghdasaryan, Matthew Willcox, Alex Shi, and Vlastimil Babka) Various tweaks related to the new alloc profiling code, folio conversion, debugging and more leftovers after SLAB" Link: https://lore.kernel.org/all/20240701190152.it.631-kees@kernel.org/ [1] * tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/memcg: alignment memcg_data define condition mm, slab: move prepare_slab_obj_exts_hook under CONFIG_MEM_ALLOC_PROFILING mm, slab: move allocation tagging code in the alloc path into a hook mm/util: Use dedicated slab buckets for memdup_user() ipc, msg: Use dedicated slab buckets for alloc_msg() mm/slab: Introduce kmem_buckets_create() and family mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument mm/slab: Plumb kmem_buckets into __do_kmalloc_node() mm/slab: Introduce kmem_buckets typedef slab, rust: extend kmalloc() alignment guarantees to remove Rust padding slab: delete useless RED_INACTIVE and RED_ACTIVE slab: don't put freepointer outside of object if only orig_size slab: make check_object() more consistent mm: Reduce the number of slab->folio casts mm, slab: don't wrap internal functions with alloc_hooks()
2024-07-18Merge tag 'bootconfig-v6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull bootconfig update from Masami Hiramatsu: - Remove duplicate included header file linux/bootconfig.h from lib/bootconfig.c. This is a cleanup, no behavior change. * tag 'bootconfig-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: bootconfig: Remove duplicate included header file linux/bootconfig.h
2024-07-18Merge tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "There's a lot of stuff in here, amd, i915 and xe have new platform work, lots of core rework around EDID handling, some new COMPILE_TEST options, maintainer changes and a lots of other stuff. Summary: core: - deprecate DRM data and return 0 date - connector: Create a set of helpers to help with HDMI support - Remove driver owner assignments - Allow more drivers to compile with COMPILE_TEST - Conversions to drm_edid - Sprinkle MODULE_DESCRIPTIONS everywhere they are missing - Remove drm_mm_replace_node - print: Add a drm prefix to warn level messages too, remove ___drm_dbg, consolidate prefix handling - New monochrome TV mode variant ttm: - improve number of page faults on some platforms - fix test builds under PREEMPT_RT - more test coverage ci: - Require a more recent version of mesa - improve farm setup and test generation dma-buf: - warn if reserving 0 fence slots - internal API heap enhancements fbdev: - Create memory manager optimized fbdev emulation panic: - Allow to select fonts - improve drm_fb_dma_get_scanout_buffer - Allow to dump kmsg to the screen bridge: - Remove redundant checks on bridge->encoder - Remove drm_bridge_chain_mode_fixup - bridge-connector: Plumb in the new HDMI helper - analogix_dp: Various improvements, handle AUX transfers timeout - samsung-dsim: Fix timings calculation - tc358767: Plenty of small fixes, fix no connector attach, fix clocks - sii902x: state validation improvements panels: - Switch panels from register table initialization to proper code - Now that the panel code tracks the panel state, remove every ad-hoc implementation in the panel drivers - More cleanup of prepare / enable state tracking in drivers - edp: Drop legacy panel compatibles - simple-bridge: Switch to devm_drm_bridge_add - New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology 13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE nv110wum-l60, IVO t109nw41, WL-355608-A8, PrimeView PM070WL4, Lincoln Technologies LCD197, Ortustech COM35H3P70ULC, AUO G104STN01, K&d kd101ne3-40ti amdgpu: - DCN 4.0.x support - GC 12.0 support - GMC 12.0 support - SDMA 7.0 support - MES12 support - MMHUB 4.1 support - GFX12 modifier and DCC support - lots of IP fixes/updates amdkfd: - Contiguous VRAM allocations - GC 12.0 support - SDMA 7.0 support - SR-IOV fixes - KFD GFX ALU exceptions i915: - Battlemage Xe2 HPD display enablement - Panel Replay enabling - DP AUX-less ALPM/LOBF - Enable link training failure fallback for DP MST links - CMRR (Content Match Refresh Rate) enabling - Increase ADL-S/ADL-P/DG2+ max TMDS bitrate to 6 Gbps - Enable eDP AUX based HDR backlight - Support replaying GPU hangs with captured context image - Automate CCS Mode setting during engine resets - lots of refactoring - Support replaying GPU hangs with captured context image - Increase FLR timeout from 3s to 9s - Enable w/a 16021333562 for DG2, MTL and ARL [guc] xe: - update MAINATINERS - New uapi adding OA functionality to Xe - expose l3 bank mask - fix display detect on ADL-N - runtime PM Fixes - Fix silent backmerge issues - More prep for SR-IOV - HWmon additions - per client usage info - Rework GPU page fault handling - Drop EXEC_QUEUE_FLAG_BANNED - Add BMG PCI IDs - Scheduler fixes and improvements - Rename xe_exec_queue::compute to xe_exec_queue::lr - Use ttm_uncached for BO with NEEDS_UC flag - Rename xe perf layer as xe observation layer - lots of refactoring radeon: - Backlight workaround for iMac - Silence UBSAN flex array warnings msm: - Validate registers XML description against schema in CI - core/dpu: SM7150 support - mdp5: Add support for MSM8937 - gpu: Add param for userspace to know if raytracing is supported - gpu: X185 support (aka gpu in X1 laptop chips) - gpu: a505 support ivpu: - hardware scheduler support - profiling support - improvements to the platform support layer - firmware handling improvements - clocks/power mgmt improvements - scheduler/logging improvements habanalabs: - Gradual sleep in polling memory macro - Reduce Gaudi2 MSI-X interrupt count to 128 - Add Gaudi2-D revision support - Add timestamp to CPLD info - Gaudi2: Assume hard-reset by firmware upon MC SEI severe error - Align Gaudi2 interrupt names - Check for errors after preboot is ready - Change habanalabs maintainer and git repo path mgag200: - refactoring and improvements - Add BMC output - enable polling nouveau: - add registry command line v3d: - perf counters improvements zynqmp: - irq and debugfs improvements atmel-hlcdc: - Support XLCDC in sam9x7 mipi-dbi: - Remove mipi_dbi_machine_little_endian - make SPI bits per word configurable - support RGB888 - allow pixel formats to be specified in the DT sun4i: - Rework the blender setup for DE2 panfrost: - Enable MT8188 support vc4: - Monochrome TV support exynos: - fix fallback mode regression - fix memory leak - Use drm_edid_duplicate() instead of kmemdup() etnaviv: - fix i.MX8MP NPU clock gating - workaround FE register cdc issues on some cores - fix DMA sync handling for cached buffers - fix job timeout handling - keep TS enabled on MMUv2 cores for improved performance mediatek: - Convert to platform remove callback returning void- - Drop chain_mode_fixup call in mode_valid() - Fixes the errors of MediaTek display driver found by IGT - Add display support for the MT8365-EVK board - Fix bit depth overwritten for mtk_ovl_set bit_depth() - Fix possible_crtcs calculation - Fix spurious kfree() ast: - refactor mode setting code stm: - Add LVDS support - DSI PHY updates" * tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernel: (2501 commits) drm/amdgpu/mes12: add missing opcode string drm/amdgpu/mes11: update opcode strings Revert "drm/amd/display: Reset freesync config before update new state" drm/omap: Restrict compile testing to PAGE_SIZE less than 64KB drm/xe: Drop trace_xe_hw_fence_free drm/xe/uapi: Rename xe perf layer as xe observation layer drm/amdgpu: remove exp hw support check for gfx12 drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed drm/amdgpu: flush all cached ras bad pages to eeprom drm/amdgpu: select compute ME engines dynamically drm/amd/display: Allow display DCC for DCN401 drm/amdgpu: select compute ME engines dynamically drm/amdgpu/job: Replace DRM_INFO/ERROR logging drm/amdgpu: select compute ME engines dynamically drm/amd/pm: Ignore initial value in smu response register drm/amdgpu: Initialize VF partition mode drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping MAINTAINERS: fix Xinhui's name MAINTAINERS: update powerplay and swsmu drm/qxl: Pin buffer objects for internal mappings ...
2024-07-16Merge tag 'net-next-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Not much excitement - a handful of large patchsets (devmem among them) did not make it in time. Core & protocols: - Use local_lock in addition to local_bh_disable() to protect per-CPU resources in networking, a step closer for local_bh_disable() not to act as a big lock on PREEMPT_RT - Use flex array for netdevice priv area, ensure its cache alignment - Add a sysctl knob to allow user to specify a default rto_min at socket init time. Bit of a big hammer but multiple companies were independently carrying such patch downstream so clearly it's useful - Support scheduling transmission of packets based on CLOCK_TAI - Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned off using cpusets - Support multiple L2TPv3 UDP tunnels using the same 5-tuple address - Allow configuration of multipath hash seed, to both allow synchronizing hashing of two routers, and preventing partial accidental sync - Improve TCP compliance with RFC 9293 for simultaneous connect() - Support sending NAT keepalives in IPsec ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it - Support sending supervision HSR frames with MAC addresses stored in ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled - Introduce IPPROTO_SMC for selecting SMC when socket is created - Allow UDP GSO transmit from devices with no checksum offload - openvswitch: add packet sampling via psample, separating the sampled traffic from "upcall" packets sent to user space for forwarding - nf_tables: shrink memory consumption for transaction objects Things we sprinkled into general kernel code: - Power Sequencing subsystem (used by Qualcomm Bluetooth driver for QCA6390) [ Already merged separately - Linus ] - Add IRQ information in sysfs for auxiliary bus - Introduce guard definition for local_lock - Add aligned flavor of __cacheline_group_{begin, end}() markings for grouping fields in structures BPF: - Notify user space (via epoll) when a struct_ops object is getting detached/unregistered - Add new kfuncs for a generic, open-coded bits iterator - Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head - Support resilient split BTF which cuts down on duplication and makes BTF as compact as possible WRT BTF from modules - Add support for dumping kfunc prototypes from BTF which enables both detecting as well as dumping compilable prototypes for kfuncs - riscv64 BPF JIT improvements in particular to add 12-argument support for BPF trampolines and to utilize bpf_prog_pack for the latter - Add the capability to offload the netfilter flowtable in XDP layer through kfuncs Driver API: - Allow users to configure IRQ tresholds between which automatic IRQ moderation can choose - Expand Power Sourcing (PoE) status with power, class and failure reason. Support setting power limits - Track additional RSS contexts in the core, make sure configuration changes don't break them - Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths - Support updating firmware on SFP modules Tests and tooling: - mptcp: use net/lib.sh to manage netns - TCP-AO and TCP-MD5: replace debug prints used by tests with tracepoints - openvswitch: make test self-contained (don't depend on OvS CLI tools) Drivers: - Ethernet high-speed NICs: - Broadcom (bnxt): - increase the max total outstanding PTP TX packets to 4 - add timestamping statistics support - implement netdev_queue_mgmt_ops - support new RSS context API - Intel (100G, ice, idpf): - implement FEC statistics and dumping signal quality indicators - support E825C products (with 56Gbps PHYs) - nVidia/Mellanox: - support HW-GRO - mlx4/mlx5: support per-queue statistics via netlink - obey the max number of EQs setting in sub-functions - AMD/Solarflare: - support new RSS context API - AMD/Pensando: - ionic: rework fix for doorbell miss to lower overhead and skip it on new HW - Wangxun: - txgbe: support Flow Director perfect filters - Ethernet NICs consumer, embedded and virtual: - Add driver for Tehuti Networks TN40xx chips - Add driver for Meta's internal NIC chips - Add driver for Ethernet MAC on Airoha EN7581 SoCs - Add driver for Renesas Ethernet-TSN devices - Google cloud vNIC: - flow steering support - Microsoft vNIC: - support page sizes other than 4KB on ARM64 - vmware vNIC: - support latency measurement (update to version 9) - VirtIO net: - support for Byte Queue Limits - support configuring thresholds for automatic IRQ moderation - support for AF_XDP Rx zero-copy - Synopsys (stmmac): - support for STM32MP13 SoC - let platforms select the right PCS implementation - TI: - icssg-prueth: add multicast filtering support - icssg-prueth: enable PTP timestamping and PPS - Renesas: - ravb: improve Rx performance 30-400% by using page pool, theaded NAPI and timer-based IRQ coalescing - ravb: add MII support for R-Car V4M - Cadence (macb): - macb: add ARP support to Wake-On-LAN - Cortina: - use phylib for RX and TX pause configuration - Ethernet switches: - nVidia/Mellanox: - support configuration of multipath hash seed - report more accurate max MTU - use page_pool to improve Rx performance - MediaTek: - mt7530: add support for bridge port isolation - Qualcomm: - qca8k: add support for bridge port isolation - Microchip: - lan9371/2: add 100BaseTX PHY support - NXP: - vsc73xx: implement VLAN operations - Ethernet PHYs: - aquantia: enable support for aqr115c - aquantia: add support for PHY LEDs - realtek: add support for rtl8224 2.5Gbps PHY - xpcs: add memory-mapped device support - add BroadR-Reach link mode and support in Broadcom's PHY driver - CAN: - add document for ISO 15765-2 protocol support - mcp251xfd: workaround for erratum DS80000789E, use timestamps to catch when device returns incorrect FIFO status - WiFi: - mac80211/cfg80211: - parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers - improvements for 6 GHz regulatory flexibility - multi-link improvements - support multiple radios per wiphy - remove DEAUTH_NEED_MGD_TX_PREP flag - Intel (iwlwifi): - bump FW API to 91 for BZ/SC devices - report 64-bit radiotap timestamp - enable P2P low latency by default - handle Transmit Power Envelope (TPE) advertised by AP - remove support for older FW for new devices - fast resume (keeping the device configured) - mvm: re-enable Multi-Link Operation (MLO) - aggregation (A-MSDU) optimizations - MediaTek (mt76): - mt7925 Multi-Link Operation (MLO) support - Qualcomm (ath10k): - LED support for various chipsets - Qualcomm (ath12k): - remove unsupported Tx monitor handling - support channel 2 in 6 GHz band - support Spatial Multiplexing Power Save (SMPS) in 6 GHz band - supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA) - support dynamic VLAN - add panic handler for resetting the firmware state - DebugFS support for datapath statistics - WCN7850: support for Wake on WLAN - Microchip (wilc1000): - read MAC address during probe to make it visible to user space - suspend/resume improvements - TI (wl18xx): - support newer firmware versions - RealTek (rtw89): - preparation for RTL8852BE-VT support - Wake on WLAN support for WiFi 6 chips - 36-bit PCI DMA support - RealTek (rtlwifi): - RTL8192DU support - Broadcom (brcmfmac): - Management Frame Protection support (to enable WPA3) - Bluetooth: - qualcomm: use the power sequencer for QCA6390 - btusb: mediatek: add ISO data transmission functions - hci_bcm4377: add BCM4388 support - btintel: add support for BlazarU core - btintel: add support for Whale Peak2 - btnxpuart: add support for AW693 A1 chipset - btnxpuart: add support for IW615 chipset - btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591" * tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits) eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering" tcp: Replace strncpy() with strscpy() wifi: ath12k: fix build vs old compiler tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child(). eth: fbnic: Write the TCAM tables used for RSS control and Rx to host eth: fbnic: Add L2 address programming eth: fbnic: Add basic Rx handling eth: fbnic: Add basic Tx handling eth: fbnic: Add link detection eth: fbnic: Add initial messaging to notify FW of our presence eth: fbnic: Implement Rx queue alloc/start/stop/free eth: fbnic: Implement Tx queue alloc/start/stop/free eth: fbnic: Allocate a netdevice and napi vectors with queues eth: fbnic: Add FW communication mechanism eth: fbnic: Add message parsing for FW messages eth: fbnic: Add register init to set PCIe/Ethernet device config eth: fbnic: Allocate core device specific structures and devlink interface eth: fbnic: Add scaffolding for Meta's NIC driver PCI: Add Meta Platforms vendor ID net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK ...
2024-07-16Merge tag 'linux_kselftest-kunit-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit updates from Shuah Khan: - add vm_mmap() allocation resource manager - convert usercopy kselftest to KUnit - disable usercopy testing on !CONFIG_MMU - add MODULE_DESCRIPTION() to core, list, and usercopy tests - add tests for assertion formatting functions - assert.c - introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros - fix KUNIT_ASSERT_STRNEQ comments to make it clear that it is an assertion - rename KUNIT_ASSERT_FAILURE to KUNIT_FAIL_AND_ABORT * tag 'linux_kselftest-kunit-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros kunit: Rename KUNIT_ASSERT_FAILURE to KUNIT_FAIL_AND_ABORT for readability kunit: Fix the comment of KUNIT_ASSERT_STRNEQ as assertion kunit: executor: Simplify string allocation handling kunit/usercopy: Add missing MODULE_DESCRIPTION() kunit/usercopy: Disable testing on !CONFIG_MMU usercopy: Convert test_user_copy to KUnit test kunit: test: Add vm_mmap() allocation resource manager list: test: add the missing MODULE_DESCRIPTION() macro kunit: add missing MODULE_DESCRIPTION() macros to core modules list: test: remove unused struct 'klist_test_struct' kunit: Cover 'assert.c' with tests
2024-07-16Merge tag 'sysctl-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Remove "->procname == NULL" check when iterating through sysctl table arrays Removing sentinels in ctl_table arrays reduces the build time size and runtime memory consumed by ~64 bytes per array. With all ctl_table sentinels gone, the additional check for ->procname == NULL that worked in tandem with the ARRAY_SIZE to calculate the size of the ctl_table arrays is no longer needed and has been removed. The sysctl register functions now returns an error if a sentinel is used. - Preparation patches for sysctl constification Constifying ctl_table structs prevents the modification of proc_handler function pointers as they would reside in .rodata. The ctl_table arguments in sysctl utility functions are const qualified in preparation for a future treewide proc_handler argument constification commit. - Misc fixes Increase robustness of set_ownership by providing sane default ownership values in case the callee doesn't set them. Bound check proc_dou8vec_minmax to avoid loading buggy modules and give sysctl testing module a name to avoid compiler complaints. * tag 'sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: Warn on an empty procname element sysctl: Remove ctl_table sentinel code comments sysctl: Remove "child" sysctl code comments sysctl: Remove superfluous empty allocations from sysctl internals sysctl: Replace nr_entries with ctl_table_size in new_links sysctl: Remove check for sentinel element in ctl_table arrays mm profiling: Remove superfluous sentinel element from ctl_table locking: Remove superfluous sentinel element from kern_lockdep_table sysctl: Add module description to sysctl-testing sysctl: constify ctl_table arguments of utility function utsname: constify ctl_table arguments of utility function sysctl: move the extra1/2 boundary check of u8 to sysctl_check_table_array sysctl: always initialize i_uid/i_gid
2024-07-16Merge tag 'hardening-v6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - lkdtm/bugs: add test for hung smp_call_function_single() (Mark Rutland) - gcc-plugins: Remove duplicate included header file stringpool.h (Thorsten Blum) - ARM: Remove address checking for MMUless devices (Yanjun Yang) - randomize_kstack: Clean up per-arch entropy and codegen - KCFI: Make FineIBT mode Kconfig selectable - fortify: Do not special-case 0-sized destinations * tag 'hardening-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: randomize_kstack: Improve stack alignment codegen ARM: Remove address checking for MMUless devices gcc-plugins: Remove duplicate included header file stringpool.h randomize_kstack: Remove non-functional per-arch entropy filtering fortify: Do not special-case 0-sized destinations x86/alternatives: Make FineIBT mode Kconfig selectable lkdtm/bugs: add test for hung smp_call_function_single()
2024-07-15Merge tag 'timers-core-2024-07-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timers, timekeeping and related functionality: Core: - Make the takeover of a hrtimer based broadcast timer reliable during CPU hot-unplug. The current implementation suffers from a race which can lead to broadcast timer starvation in the worst case. - VDSO related cleanups and simplifications - Small cleanups and enhancements all over the place PTP: - Replace the architecture specific base clock to clocksource, e.g. ART to TSC, conversion function with generic functionality to avoid exposing such internals to drivers and convert all existing drivers over. This also allows to provide functionality which converts the other way round in the core code based on the same parameter set. - Provide a function to convert CLOCK_REALTIME to the base clock to support the upcoming PPS output driver on Intel platforms. Drivers: - A set of Device Tree bindings for new hardware - Cleanups and enhancements all over the place" * tag 'timers-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) clocksource/drivers/realtek: Add timer driver for rtl-otto platforms dt-bindings: timer: Add schema for realtek,otto-timer dt-bindings: timer: Add SOPHGO SG2002 clint dt-bindings: timer: renesas,tmu: Add R-Car Gen2 support dt-bindings: timer: renesas,tmu: Add RZ/G1 support dt-bindings: timer: renesas,tmu: Add R-Mobile APE6 support clocksource/drivers/mips-gic-timer: Correct sched_clock width clocksource/drivers/mips-gic-timer: Refine rating computation clocksource/drivers/sh_cmt: Address race condition for clock events clocksource/driver/arm_global_timer: Remove unnecessary ‘0’ values from err clocksource/drivers/arm_arch_timer: Remove unnecessary ‘0’ values from irq tick/broadcast: Make takeover of broadcast hrtimer reliable tick/sched: Combine WARN_ON_ONCE and print_once x86/vdso: Remove unused include x86/vgtod: Remove unused typedef gtod_long_t x86/vdso: Fix function reference in comment vdso: Add comment about reason for vdso struct ordering vdso/gettimeofday: Clarify comment about open coded function timekeeping: Add missing kernel-doc function comments tick: Remove unnused tick_nohz_get_idle_calls() ...
2024-07-15Merge tag 'core-debugobjects-2024-07-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debugobjects update from Thomas Gleixner: "A single update for debugobjects to annotate all intentionally racy global debug variables so that KCSAN ignores them" * tag 'core-debugobjects-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobjects: Annotate racy debug variables
2024-07-15Merge branch 'slab/for-6.11/buckets' into slab/for-nextVlastimil Babka
Merge all the slab patches previously collected on top of v6.10-rc1, over cleanups/fixes that had to be based on rc6.
2024-07-14Merge tag 'kbuild-fixes-v6.10-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Make scripts/ld-version.sh robust against the latest LLD - Fix warnings in rpm-pkg with device tree support - Fix warnings in fortify tests with KASAN * tag 'kbuild-fixes-v6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: fortify: fix warnings in fortify tests with KASAN kbuild: rpm-pkg: avoid the warnings with dtb's listed twice kbuild: Make ld-version.sh more robust against version string changes
2024-07-15fortify: fix warnings in fortify tests with KASANMasahiro Yamada
When a software KASAN mode is enabled, the fortify tests emit warnings on some architectures. For example, for ARCH=arm, the combination of CONFIG_FORTIFY_SOURCE=y and CONFIG_KASAN=y produces the following warnings: TEST lib/test_fortify/read_overflow-memchr.log warning: unsafe memchr() usage lacked '__read_overflow' warning in lib/test_fortify/read_overflow-memchr.c TEST lib/test_fortify/read_overflow-memchr_inv.log warning: unsafe memchr_inv() usage lacked '__read_overflow' symbol in lib/test_fortify/read_overflow-memchr_inv.c TEST lib/test_fortify/read_overflow-memcmp.log warning: unsafe memcmp() usage lacked '__read_overflow' warning in lib/test_fortify/read_overflow-memcmp.c TEST lib/test_fortify/read_overflow-memscan.log warning: unsafe memscan() usage lacked '__read_overflow' symbol in lib/test_fortify/read_overflow-memscan.c TEST lib/test_fortify/read_overflow2-memcmp.log warning: unsafe memcmp() usage lacked '__read_overflow2' warning in lib/test_fortify/read_overflow2-memcmp.c [ more and more similar warnings... ] Commit 9c2d1328f88a ("kbuild: provide reasonable defaults for tool coverage") removed KASAN flags from non-kernel objects by default. It was an intended behavior because lib/test_fortify/*.c are unit tests that are not linked to the kernel. As it turns out, some architectures require -fsanitize=kernel-(hw)address to define __SANITIZE_ADDRESS__ for the fortify tests. Without __SANITIZE_ADDRESS__ defined, arch/arm/include/asm/string.h defines __NO_FORTIFY, thus excluding <linux/fortify-string.h>. This issue does not occur on x86 thanks to commit 4ec4190be4cf ("kasan, x86: don't rename memintrinsics in uninstrumented files"), but there are still some architectures that define __NO_FORTIFY in such a situation. Set KASAN_SANITIZE=y explicitly to the fortify tests. Fixes: 9c2d1328f88a ("kbuild: provide reasonable defaults for tool coverage") Reported-by: Arnd Bergmann <arnd@arndb.de> Closes: https://lore.kernel.org/all/0e8dee26-41cc-41ae-9493-10cd1a8e3268@app.fastmail.com/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-13Merge tag 'timers-v6.11-rc1' of ↵Thomas Gleixner
https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clocksource/event driver updates from Daniel Lezcano: - Remove unnecessary local variables initialization as they will be initialized in the code path anyway right after on the ARM arch timer and the ARM global timer (Li kunyu) - Fix a race condition in the interrupt leading to a deadlock on the SH CMT driver. Note that this fix was not tested on the platform using this timer but the fix seems reasonable enough to be picked confidently (Niklas Söderlund) - Increase the rating of the gic-timer and use the configured width clocksource register on the MIPS architecture (Jiaxun Yang) - Add the DT bindings for the TMU on the Renesas platforms (Geert Uytterhoeven) - Add the DT bindings for the SOPHGO SG2002 clint on RiscV (Thomas Bonnefille) - Add the rtl-otto timer driver along with the DT bindings for the Realtek platform (Chris Packham) Link: https://lore.kernel.org/all/91cd05de-4c5d-4242-a381-3b8a4fe6a2a2@linaro.org
2024-07-13crypto: lib/mpi - delete unnecessary conditionDan Carpenter
We checked that "nlimbs" is non-zero in the outside if statement so delete the duplicate check here. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-07-12lib/bch.c: use swap() to improve codeThorsten Blum
Use the swap() macro to simplify the functions solve_linear_system() and gf_poly_gcd() and improve their readability. Remove the local variable tmp. Fixes the following three Coccinelle/coccicheck warnings reported by swap.cocci: WARNING opportunity for swap() WARNING opportunity for swap() WARNING opportunity for swap() Link: https://lkml.kernel.org/r/20240708224023.9312-2-thorsten.blum@toblux.com Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12test_bpf: convert comma to semicolonChen Ni
Replace commas between expression statements with semicolons. Link: https://lkml.kernel.org/r/20240709034323.586185-1-nichen@iscas.ac.cn Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12kunit: executor: Simplify string allocation handlingKees Cook
The alloc/copy code pattern is better consolidated to single kstrdup (and kstrndup) calls instead. This gets rid of deprecated[1] strncpy() uses as well. Replace one other strncpy() use with the more idiomatic strscpy(). Link: https://github.com/KSPP/linux/issues/90 [1] Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>