linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2022-11-21	Merge branch 'slab/for-6.2/alloc_size' into slab/for-next	Vlastimil Babka
	Two patches from Kees Cook [1]: These patches work around a deficiency in GCC (>=11) and Clang (<16) where the __alloc_size attribute does not apply to inlines. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503 This manifests as reduced overflow detection coverage for many allocation sites under CONFIG_FORTIFY_SOURCE=y, where the allocation size was not actually being propagated to __builtin_dynamic_object_size(). [1] https://lore.kernel.org/all/20221118034713.gonna.754-kees@kernel.org/
2022-11-21	Merge branch 'slab/for-6.2/kmalloc_redzone' into slab/for-next	Vlastimil Babka
	kmalloc() redzone improvements by Feng Tang From cover letter [1]: kmalloc's API family is critical for mm, and one of its nature is that it will round up the request size to a fixed one (mostly power of 2). When user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so there is an extra space than what is originally requested. This patchset tries to extend the redzone sanity check to the extra kmalloced buffer than requested, to better detect un-legitimate access to it. (depends on SLAB_STORE_USER & SLAB_RED_ZONE) [1] https://lore.kernel.org/all/20221021032405.1825078-1-feng.tang@intel.com/
2022-11-21	Merge branch 'slab/for-6.2/fit_rcu_head' into slab/for-next	Vlastimil Babka
	A series by myself to reorder fields in struct slab to allow the embedded rcu_head to grow (for debugging purposes). Requires changes to isolate_movable_page() to skip slab pages which can otherwise become false-positive __PageMovable due to its use of low bits in page->mapping.
2022-11-21	Merge branch 'slab/for-6.2/tools' into slab/for-next	Vlastimil Babka
	A patch for tools/vm/slabinfo to give more useful feedback when not run as a root, by Rong Tao.
2022-11-21	Merge branch 'slab/for-6.2/slub-sysfs' into slab/for-next	Vlastimil Babka
	- Two patches for SLUB's sysfs by Rasmus Villemoes to remove dead code and optimize boot time with late initialization. - Allow SLUB's sysfs 'failslab' parameter to be runtime-controllable again as it can be both useful and safe, by Alexander Atanasov.
2022-11-21	Merge branch 'slab/for-6.2/locking' into slab/for-next	Vlastimil Babka
	A patch from Jiri Kosina that makes SLAB's list_lock a raw_spinlock_t. While there are no plans to make SLAB actually compatible with PREEMPT_RT or any other future, it makes !PREEMPT_RT lockdep happy.
2022-11-21	Merge branch 'slab/for-6.2/cleanups' into slab/for-next	Vlastimil Babka
	- Removal of dead code from deactivate_slab() by Hyeonggon Yoo. - Fix of BUILD_BUG_ON() for sufficient early percpu size by Baoquan He. - Make kmem_cache_alloc() kernel-doc less misleading, by myself.
2022-11-21	slab: Remove special-casing of const 0 size allocations	Kees Cook
	Passing a constant-0 size allocation into kmalloc() or kmalloc_node() does not need to be a fast-path operation, so the static return value can be removed entirely. This makes sure that all paths through the inlines result in a full extern function call, where __alloc_size() hints will actually be seen[1] by GCC. (A constant return value of 0 means the "0" allocation size won't be propagated by the inline.) [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503 Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-21	slab: Clean up SLOB vs kmalloc() definition	Kees Cook
	As already done for kmalloc_node(), clean up the #ifdef usage in the definition of kmalloc() so that the SLOB-only version is an entirely separate and much more readable function. Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-21	mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head	Vlastimil Babka
	Joel reports [1] that increasing the rcu_head size for debugging purposes used to work before struct slab was split from struct page, but now runs into the various SLAB_MATCH() sanity checks of the layout. This is because the rcu_head in struct page is in union with large sub-structures and has space to grow without exceeding their size, while in struct slab (for SLAB and SLUB) it's in union only with a list_head. On closer inspection (and after the previous patch) we can put all fields except slab_cache to a union with rcu_head, as slab_cache is sufficient for the rcu freeing callbacks to work and the rest can be overwritten by rcu_head without causing issues. This is only somewhat complicated by the need to keep SLUB's freelist+counters aligned for cmpxchg_double. As a result the fields need to be reordered so that slab_cache is first (after page flags) and the union with rcu_head follows. For consistency, do that for SLAB as well, although not necessary there. As a result, the rcu_head field in struct page and struct slab is no longer at the same offset, but that doesn't matter as there is no casting that would rely on that in the slab freeing callbacks, so we can just drop the respective SLAB_MATCH() check. Also we need to update the SLAB_MATCH() for compound_head to reflect the new ordering. While at it, also add a static_assert to check the alignment needed for cmpxchg_double so mistakes are found sooner than a runtime GPF. [1] https://lore.kernel.org/all/85afd876-d8bb-0804-b2c5-48ed3055e702@joelfernandes.org/ Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-11-21	mm/migrate: make isolate_movable_page() skip slab pages	Vlastimil Babka
	In the next commit we want to rearrange struct slab fields to allow a larger rcu_head. Afterwards, the page->mapping field will overlap with SLUB's "struct list_head slab_list", where the value of prev pointer can become LIST_POISON2, which is 0x122 + POISON_POINTER_DELTA. Unfortunately the bit 1 being set can confuse PageMovable() to be a false positive and cause a GPF as reported by lkp [1]. To fix this, make isolate_movable_page() skip pages with the PageSlab flag set. This is a bit tricky as we need to add memory barriers to SLAB and SLUB's page allocation and freeing, and their counterparts to isolate_movable_page(). Based on my RFC from [2]. Added a comment update from Matthew's variant in [3] and, as done there, moved the PageSlab checks to happen before trying to take the page lock. [1] https://lore.kernel.org/all/208c1757-5edd-fd42-67d4-1940cc43b50f@intel.com/ [2] https://lore.kernel.org/all/aec59f53-0e53-1736-5932-25407125d4d4@suse.cz/ [3] https://lore.kernel.org/all/YzsVM8eToHUeTP75@casper.infradead.org/ Reported-by: kernel test robot <yujie.liu@intel.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-11-21	mm/slab: move and adjust kernel-doc for kmem_cache_alloc	Vlastimil Babka
	Alexander reports an issue with the kmem_cache_alloc() comment in mm/slab.c: > The current comment mentioned that the flags only matters if the > cache has no available objects. It's different for the __GFP_ZERO > flag which will ensure that the returned object is always zeroed > in any case. > I have the feeling I run into this question already two times if > the user need to zero the object or not, but the user does not need > to zero the object afterwards. However another use of __GFP_ZERO > and only zero the object if the cache has no available objects would > also make no sense. and suggests thus mentioning __GFP_ZERO as the exception. But on closer inspection, the part about flags being only relevant if cache has no available objects is misleading. The slab user has no reliable way to determine if there are available objects, and e.g. the might_sleep() debug check can be performed even if objects are available, so passing correct flags given the allocation context always matters. Thus remove that sentence completely, and while at it, move the comment to from SLAB-specific mm/slab.c to the common include/linux/slab.h The comment otherwise refers flags description for kmalloc(), so add __GFP_ZERO comment there and remove a very misleading GFP_HIGHUSER (not applicable to slab) description from there. Mention kzalloc() and kmem_cache_zalloc() shortcuts. Reported-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/all/20221011145413.8025-1-aahringo@redhat.com/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-21	mm/slub, percpu: correct the calculation of early percpu allocation size	Baoquan He
	SLUB allocator relies on percpu allocator to initialize its ->cpu_slab during early boot. For that, the dynamic chunk of percpu which serves the early allocation need be large enough to satisfy the kmalloc creation. However, the current BUILD_BUG_ON() in alloc_kmem_cache_cpus() doesn't consider the kmalloc array with NR_KMALLOC_TYPES length. Fix that with correct calculation. Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-21	percpu: adjust the value of PERCPU_DYNAMIC_EARLY_SIZE	Baoquan He
	LKP reported a build failure as below on the following patch "mm/slub, percpu: correct the calculation of early percpu allocation size" ~~~~~~ In file included from <command-line>: In function 'alloc_kmem_cache_cpus', inlined from 'kmem_cache_open' at mm/slub.c:4340:6: >> >> include/linux/compiler_types.h:357:45: error: call to '__compiletime_assert_474' declared with attribute error: BUILD_BUG_ON failed: PERCPU_DYNAMIC_EARLY_SIZE < NR_KMALLOC_TYPES * KMALLOC_SHIFT_HIGH * sizeof(struct kmem_cache_cpu) 357 \| _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ~~~~~~ From the kernel config file provided by LKP, the building was made on arm64 with below Kconfig item enabled: CONFIG_ZONE_DMA=y CONFIG_SLUB_CPU_PARTIAL=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_SLUB_STATS=y CONFIG_ARM64_PAGE_SHIFT=16 CONFIG_ARM64_64K_PAGES=y Then we will have: NR_KMALLOC_TYPES:4 KMALLOC_SHIFT_HIGH:17 sizeof(struct kmem_cache_cpu):184 The product of them is 12512, which is bigger than PERCPU_DYNAMIC_EARLY_SIZE, 12K. Hence, the BUILD_BUG_ON in alloc_kmem_cache_cpus() is triggered. Earlier, in commit 099a19d91ca4 ("percpu: allow limited allocation before slab is online"), PERCPU_DYNAMIC_EARLY_SIZE was introduced and set to 12K which is equal to the then PERPCU_DYNAMIC_RESERVE. Later, in commit 1a4d76076cda ("percpu: implement asynchronous chunk population"), PERPCU_DYNAMIC_RESERVE was increased by 8K, while PERCPU_DYNAMIC_EARLY_SIZE was kept unchanged. So, here increase PERCPU_DYNAMIC_EARLY_SIZE by 8K too to accommodate to the slub's requirement. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-11	mm/slub: extend redzone check to extra allocated kmalloc space than requested	Feng Tang
	kmalloc will round up the request size to a fixed size (mostly power of 2), so there could be a extra space than what is requested, whose size is the actual buffer size minus original request size. To better detect out of bound access or abuse of this space, add redzone sanity check for it. In current kernel, some kmalloc user already knows the existence of the space and utilizes it after calling 'ksize()' to know the real size of the allocated buffer. So we skip the sanity check for objects which have been called with ksize(), as treating them as legitimate users. Kees Cook is working on sanitizing all these user cases, by using kmalloc_size_roundup() to avoid ambiguous usages. And after this is done, this special handling for ksize() can be removed. In some cases, the free pointer could be saved inside the latter part of object data area, which may overlap the redzone part(for small sizes of kmalloc objects). As suggested by Hyeonggon Yoo, force the free pointer to be in meta data area when kmalloc redzone debug is enabled, to make all kmalloc objects covered by redzone check. Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-10	mm: kasan: Extend kasan_metadata_size() to also cover in-object size	Feng Tang
	When kasan is enabled for slab/slub, it may save kasan' free_meta data in the former part of slab object data area in slab object's free path, which works fine. There is ongoing effort to extend slub's debug function which will redzone the latter part of kmalloc object area, and when both of the debug are enabled, there is possible conflict, especially when the kmalloc object has small size, as caught by 0Day bot [1]. To solve it, slub code needs to know the in-object kasan's meta data size. Currently, there is existing kasan_metadata_size() which returns the kasan's metadata size inside slub's metadata area, so extend it to also cover the in-object meta size by adding a boolean flag 'in_object'. There is no functional change to existing code logic. [1]. https://lore.kernel.org/lkml/YuYm3dWwpZwH58Hu@xsang-OptiPlex-9020/ Reported-by: kernel test robot <oliver.sang@intel.com> Suggested-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-10	mm/slub: only zero requested size of buffer for kzalloc when debug enabled	Feng Tang
	kzalloc/kmalloc will round up the request size to a fixed size (mostly power of 2), so the allocated memory could be more than requested. Currently kzalloc family APIs will zero all the allocated memory. To detect out-of-bound usage of the extra allocated memory, only zero the requested part, so that redzone sanity check could be added to the extra space later. For kzalloc users who will call ksize() later and utilize this extra space, please be aware that the space is not zeroed any more when debug is enabled. (Thanks to Kees Cook's effort to sanitize all ksize() user cases [1], this won't be a big issue). [1]. https://lore.kernel.org/all/20220922031013.2150682-1-keescook@chromium.org/#r Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-10	tools/vm/slabinfo: indicates the cause of the EACCES error	Rong Tao
	If you don't run slabinfo with a superuser, return 0 when read_slab_dir() reads get_obj_and_str("slabs", &t), because fopen() fails (sometimes EACCES), causing slabcache() to return directly, without any error during this time, we should tell the user about the EACCES problem instead of running successfully($?=0) without any error printing. For example: $ ./slabinfo Permission denied, Try using superuser <== What this submission did $ sudo ./slabinfo Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg Acpi-Namespace 5950 48 286.7K 65/0/5 85 0 0 99 Acpi-Operand 13664 72 999.4K 231/0/13 56 0 0 98 ... Signed-off-by: Rong Tao <rongtao@cestc.cn> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-07	mm, slab: remove duplicate kernel-doc comment for ksize()	Vlastimil Babka
	Akira reports: > "make htmldocs" reports duplicate C declaration of ksize() as follows: > /linux/Documentation/core-api/mm-api:43: ./mm/slab_common.c:1428: WARNING: Duplicate C declaration, also defined at core-api/mm-api:212. > Declaration is '.. c:function:: size_t ksize (const void *objp)'. > This is due to the kernel-doc comment for ksize() declaration added in > include/linux/slab.h by commit 05a940656e1e ("slab: Introduce > kmalloc_size_roundup()"). There is an older kernel-doc comment for ksize() definition in mm/slab_common.c, which is not only duplicated, but also contradicts the new one - the additional storage discovered by ksize() should not be used by callers anymore. Delete the old kernel-doc. Reported-by: Akira Yokosawa <akiyks@gmail.com> Link: https://lore.kernel.org/all/d33440f6-40cf-9747-3340-e54ffaf7afb8@gmail.com/ Fixes: 05a940656e1e ("slab: Introduce kmalloc_size_roundup()") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-06	mm/slab_common: Restore passing "caller" for tracing	Kees Cook
	The "caller" argument was accidentally being ignored in a few places that were recently refactored. Restore these "caller" arguments, instead of _RET_IP_. Fixes: 11e9734bcb6a ("mm/slab_common: unify NUMA and UMA version of tracepoints") Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-04	mm/slab: remove !CONFIG_TRACING variants of kmalloc_[node_]trace()	Vlastimil Babka
	For !CONFIG_TRACING kernels, the kmalloc() implementation tries (in cases where the allocation size is build-time constant) to save a function call, by inlining kmalloc_trace() to a kmem_cache_alloc() call. However since commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") this path now fails to pass the original request size to be eventually recorded (for kmalloc caches with debugging enabled). We could adjust the code to call __kmem_cache_alloc_node() as the CONFIG_TRACING variant, but that would as a result inline a call with 5 parameters, bloating the kmalloc() call sites. The cost of extra function call (to kmalloc_trace()) seems like a lesser evil. It also appears that the !CONFIG_TRACING variant is incompatible with upcoming hardening efforts [1] so it's easier if we just remove it now. Kernels with no tracing are rare these days and the benefit is dubious anyway. [1] https://lore.kernel.org/linux-mm/20221101222520.never.109-kees@kernel.org/T/#m20ecf14390e406247bde0ea9cce368f469c539ed Link: https://lore.kernel.org/all/097d8fba-bd10-a312-24a3-a4068c4f424c@suse.cz/ Suggested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-03	mm/slab_common: repair kernel-doc for __ksize()	Lukas Bulwahn
	Commit 445d41d7a7c1 ("Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next") resolved a conflict of two concurrent changes to __ksize(). However, it did not adjust the kernel-doc comment of __ksize(), while the name of the argument to __ksize() was renamed. Hence, ./scripts/ kernel-doc -none mm/slab_common.c warns about it. Adjust the kernel-doc comment for __ksize() for make W=1 happiness. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-10-24	mm/slub: perform free consistency checks before call_rcu	Vlastimil Babka
	For SLAB_TYPESAFE_BY_RCU caches we use call_rcu to perform empty slab freeing. The rcu callback rcu_free_slab() calls __free_slab() that currently includes checking the slab consistency for caches with SLAB_CONSISTENCY_CHECKS flags. This check needs the slab->objects field to be intact. Because in the next patch we want to allow rcu_head in struct slab to become larger in debug configurations and thus potentially overwrite more fields through a union than slab_list, we want to limit the fields used in rcu_free_slab(). Thus move the consistency checks to free_slab() before call_rcu(). This can be done safely even for SLAB_TYPESAFE_BY_RCU caches where accesses to the objects can still occur after freeing them. As a result, only the slab->slab_cache field has to be physically separate from rcu_head for the freeing callback to work. We also save some cycles in the rcu callback for caches with consistency checks enabled. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-10-24	mm/slab: Annotate kmem_cache_node->list_lock as raw	Jiri Kosina
	The list_lock can be taken in hardirq context when do_drain() is being called via IPI on all cores, and therefore lockdep complains about it, because it can't be preempted on PREEMPT_RT. That's not a real issue, as SLAB can't be built on PREEMPT_RT anyway, but we still want to get rid of the warning on non-PREEMPT_RT builds. Annotate it therefore as a raw lock in order to get rid of he lockdep warning below. ============================= [ BUG: Invalid wait context ] 6.1.0-rc1-00134-ge35184f32151 #4 Not tainted ----------------------------- swapper/3/0 is trying to lock: ffff8bc88086dc18 (&parent->list_lock){..-.}-{3:3}, at: do_drain+0x57/0xb0 other info that might help us debug this: context-{2:2} no locks held by swapper/3/0. stack backtrace: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.1.0-rc1-00134-ge35184f32151 #4 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 Call Trace: <IRQ> dump_stack_lvl+0x6b/0x9d __lock_acquire+0x1519/0x1730 ? build_sched_domains+0x4bd/0x1590 ? __lock_acquire+0xad2/0x1730 lock_acquire+0x294/0x340 ? do_drain+0x57/0xb0 ? sched_clock_tick+0x41/0x60 _raw_spin_lock+0x2c/0x40 ? do_drain+0x57/0xb0 do_drain+0x57/0xb0 __flush_smp_call_function_queue+0x138/0x220 __sysvec_call_function+0x4f/0x210 sysvec_call_function+0x4b/0x90 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 RIP: 0010:mwait_idle+0x5e/0x80 Code: 31 d2 65 48 8b 04 25 80 ed 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d 0b 78 46 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 06 fb 0f 1f 44 00 00 65 48 8b 04 25 80 ed 01 00 f0 80 60 02 df RSP: 0000:ffffa90940217ee0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9bb9f93a RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000001 R10: ffffa90940217ea8 R11: 0000000000000000 R12: ffffffffffffffff R13: 0000000000000000 R14: ffff8bc88127c500 R15: 0000000000000000 ? default_idle_call+0x1a/0xa0 default_idle_call+0x4b/0xa0 do_idle+0x1f1/0x2c0 ? _raw_spin_unlock_irqrestore+0x56/0x70 cpu_startup_entry+0x19/0x20 start_secondary+0x122/0x150 secondary_startup_64_no_verify+0xce/0xdb </TASK> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-10-24	mm/slub: remove dead code for debug caches on deactivate_slab()	Hyeonggon Yoo
	After commit c7323a5ad0786 ("mm/slub: restrict sysfs validation to debug caches and make it safe"), SLUB never installs percpu slab for debug caches and thus never deactivates percpu slab for them. Since only debug caches use the full list, SLUB no longer deactivates to full list. Remove dead code in deactivate_slab(). Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-10-24	mm: Make failslab writable again	Alexander Atanasov
	In (060807f841ac mm, slub: make remaining slub_debug related attributes read-only) failslab was made read-only. I think it became a collateral victim to the two other options for which the reasons are perfectly valid. Here is why: - sanity_checks and trace are slab internal debug options, failslab is used for fault injection. - for fault injections, which by presumption are random, it does not matter if it is not set atomically. And you need to set atleast one more option to trigger fault injection. - in a testing scenario you may need to change it at runtime example: module loading - you test all allocations limited by the space option. Then you move to test only your module's own slabs. - when set by command line flags it effectively disables all cache merges. Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-5-vbabka@suse.cz Signed-off-by: Alexander Atanasov <alexander.atanasov@virtuozzo.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-10-24	mm: slub: make slab_sysfs_init() a late_initcall	Rasmus Villemoes
	Currently, slab_sysfs_init() is an __initcall aka device_initcall. It is rather time-consuming; on my board it takes around 11ms. That's about 1% of the time budget I have from U-Boot letting go and until linux must assume responsibility of keeping the external watchdog happy. There's no particular reason this would need to run at device_initcall time, so instead make it a late_initcall to allow vital functionality to get started a bit sooner. This actually ends up winning more than just those 11ms, because the slab caches that get created during other device_initcalls (and before my watchdog device gets probed) now don't end up doing the somewhat expensive sysfs_slab_add() themselves. Some example lines (with initcall_debug set) before/after: initcall ext4_init_fs+0x0/0x1ac returned 0 after 1386 usecs initcall journal_init+0x0/0x138 returned 0 after 517 usecs initcall init_fat_fs+0x0/0x68 returned 0 after 294 usecs initcall ext4_init_fs+0x0/0x1ac returned 0 after 240 usecs initcall journal_init+0x0/0x138 returned 0 after 32 usecs initcall init_fat_fs+0x0/0x68 returned 0 after 18 usecs Altogether, this means I now get to petting the watchdog around 17ms sooner. [Of course, the time the other initcalls save is instead spent in slab_sysfs_init(), which goes from 11ms to 16ms, so there's no overall change in boot time.] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-10-24	mm: slub: remove dead and buggy code from sysfs_slab_add()	Rasmus Villemoes
	The function sysfs_slab_add() has two callers: One is slab_sysfs_init(), which first initializes slab_kset, and only when that succeeds sets slab_state to FULL, and then proceeds to call sysfs_slab_add() for all previously created slabs. The other is __kmem_cache_create(), but only after a if (slab_state <= UP) return 0; check. So in other words, sysfs_slab_add() is never called without slab_kset (aka the return value of cache_kset()) being non-NULL. And this is just as well, because if we ever did take this path and called kobject_init(&s->kobj), and then later when called again from slab_sysfs_init() would end up calling kobject_init_and_add(), we would hit if (kobj->state_initialized) { /* do not error out as sometimes we can recover */ pr_err("kobject (%p): tried to init an initialized object, something is seriously wrong.\n", dump_stack(); } in kobject.c. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-10-23	Linux 6.1-rc2v6.1-rc2	Linus Torvalds

2022-10-23	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm fixes from Paolo Bonzini: "RISC-V: - Fix compilation without RISCV_ISA_ZICBOM - Fix kvm_riscv_vcpu_timer_pending() for Sstc ARM: - Fix a bug preventing restoring an ITS containing mappings for very large and very sparse device topology - Work around a relocation handling error when compiling the nVHE object with profile optimisation - Fix for stage-2 invalidation holding the VM MMU lock for too long by limiting the walk to the largest block mapping size - Enable stack protection and branch profiling for VHE - Two selftest fixes x86: - add compat implementation for KVM_X86_SET_MSR_FILTER ioctl selftests: - synchronize includes between include/uapi and tools/include/uapi" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: tools: include: sync include/api/linux/kvm.h KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter() kvm: Add support for arch compat vm ioctls RISC-V: KVM: Fix kvm_riscv_vcpu_timer_pending() for Sstc RISC-V: Fix compilation without RISCV_ISA_ZICBOM KVM: arm64: vgic: Fix exit condition in scan_its_table() KVM: arm64: nvhe: Fix build with profile optimization KVM: selftests: Fix number of pages for memory slot in memslot_modification_stress_test KVM: arm64: selftests: Fix multiple versions of GIC creation KVM: arm64: Enable stack protection and branch profiling for VHE KVM: arm64: Limit stage2_apply_range() batch size to largest block KVM: arm64: Work out supported block level at compile time
2022-10-23	Revert "mfd: syscon: Remove repetition of the regmap_get_val_endian()"	Jason A. Donenfeld
	This reverts commit 72a95859728a7866522e6633818bebc1c2519b17. It broke reboots on big-endian MIPS and MIPS64 malta QEMU instances, which use the syscon driver. Little-endian is not effected, which means likely it's important to handle regmap_get_val_endian() in this function after all. Fixes: 72a95859728a ("mfd: syscon: Remove repetition of the regmap_get_val_endian()") Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Lee Jones <lee@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-23	kernel/utsname_sysctl.c: Fix hostname polling	Linus Torvalds
	Commit bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch") added a new entry to the uts_kern_table[] array, but didn't update the UTS_PROC_xyz enumerators of older entries, breaking anything that used them. Which is admittedly not many cases: it's really just the two uses of uts_proc_notify() in kernel/sys.c. But apparently journald-systemd actually uses this to detect hostname changes. Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Fixes: bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch") Link: https://lore.kernel.org/lkml/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/ Link: https://linux-regtracking.leemhuis.info/regzbot/regression/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/ Cc: Petr Vorel <pvorel@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-23	Merge tag 'perf_urgent_for_v6.1_rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Fix raw data handling when perf events are used in bpf - Rework how SIGTRAPs get delivered to events to address a bunch of problems with it. Add a selftest for that too * tag 'perf_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: bpf: Fix sample_flags for bpf_perf_event_output selftests/perf_events: Add a SIGTRAP stress test with disables perf: Fix missing SIGTRAPs
2022-10-23	Merge tag 'sched_urgent_for_v6.1_rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Adjust code to not trip up CFI - Fix sched group cookie matching * tag 'sched_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Introduce struct balance_callback to avoid CFI mismatches sched/core: Fix comparison in sched_group_cookie_match()
2022-10-23	Merge tag 'objtool_urgent_for_v6.1_rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Borislav Petkov: - Fix ORC stack unwinding when GCOV is enabled * tag 'objtool_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Fix unreliable stack dump with gcov
2022-10-23	Merge tag 'x86_urgent_for_v6.0_rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "As usually the case, right after a major release, the tip urgent branches accumulate a couple more fixes than normal. And here is the x86, a bit bigger, urgent pile. - Use the correct CPU capability clearing function on the error path in Intel perf LBR - A CFI fix to ftrace along with a simplification - Adjust handling of zero capacity bit mask for resctrl cache allocation on AMD - A fix to the AMD microcode loader to attempt patch application on every logical thread - A couple of topology fixes to handle CPUID leaf 0x1f enumeration info properly - Drop a -mabi=ms compiler option check as both compilers support it now anyway - A couple of fixes to how the initial, statically allocated FPU buffer state is setup and its interaction with dynamic states at runtime" * tag 'x86_urgent_for_v6.0_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly perf/x86/intel/lbr: Use setup_clear_cpu_cap() instead of clear_cpu_cap() ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph() x86/ftrace: Remove ftrace_epilogue() x86/resctrl: Fix min_cbm_bits for AMD x86/microcode/AMD: Apply the patch early on every logical thread x86/topology: Fix duplicated core ID within a package x86/topology: Fix multiple packages shown on a single-package system hwmon/coretemp: Handle large core ID value x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUB x86/fpu: Exclude dynamic states from init_fpstate x86/fpu: Fix the init_fpstate size check with the actual size x86/fpu: Configure init_fpstate attributes orderly
2022-10-23	Merge tag 'io_uring-6.1-2022-10-22' of git://git.kernel.dk/linux	Linus Torvalds
	Pull io_uring follow-up from Jens Axboe: "Currently the zero-copy has automatic fallback to normal transmit, and it was decided that it'd be cleaner to return an error instead if the socket type doesn't support it. Zero-copy does work with UDP and TCP, it's more of a future proofing kind of thing (eg for samba)" * tag 'io_uring-6.1-2022-10-22' of git://git.kernel.dk/linux: io_uring/net: fail zc sendmsg when unsupported by socket io_uring/net: fail zc send when unsupported by socket net: flag sockets supporting msghdr originated zerocopy
2022-10-22	Merge tag 'hwmon-for-v6.1-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - corsair-psu: Fix typo in USB id description, and add USB ID for new PSU - pwm-fan: Fix fan power handling when disabling fan control * tag 'hwmon-for-v6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (corsair-psu) Add USB id of the new HX1500i psu hwmon: (pwm-fan) Explicitly switch off fan power when setting pwm1_enable to 0 hwmon: (corsair-psu) fix typo in USB id description
2022-10-22	Merge tag 'i2c-for-6.1-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "RPM fix for qcom-cci, platform module alias for xiic, build warning fix for mlxbf, typo fixes in comments" * tag 'i2c-for-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: mlxbf: depend on ACPI; clean away ifdeffage i2c: fix spelling typos in comments i2c: qcom-cci: Fix ordering of pm_runtime_xx and i2c_add_adapter i2c: xiic: Add platform module alias
2022-10-22	Merge tag 'pci-v6.1-fixes-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci fixes from Bjorn Helgaas: - Revert a simplification that broke pci-tegra due to a masking error - Update MAINTAINERS for Kishon's email address change and TI DRA7XX/J721E maintainer change * tag 'pci-v6.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: MAINTAINERS: Update Kishon's email address in PCI endpoint subsystem MAINTAINERS: Add Vignesh Raghavendra as maintainer of TI DRA7XX/J721E PCI driver Revert "PCI: tegra: Use PCI_CONF1_EXT_ADDRESS() macro"
2022-10-22	Merge tag 'media/v6.1-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull missed media updates from Mauro Carvalho Chehab: "It seems I screwed-up my previous pull request: it ends up that only half of the media patches that were in linux-next got merged in -rc1. The script which creates the signed tags silently failed due to 5.19->6.0 so it ended generating a tag with incomplete stuff. So here are the missing parts: - a DVB core security fix - lots of fixes and cleanups for atomisp staging driver - old drivers that are VB1 are being moved to staging to be deprecated - several driver updates - mostly for embedded systems, but there are also some things addressing issues with some PC webcams, in the UVC video driver" * tag 'media/v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (163 commits) media: sun6i-csi: Move csi buffer definition to main header file media: sun6i-csi: Introduce and use video helper functions media: sun6i-csi: Add media ops with link notify callback media: sun6i-csi: Remove controls handler from the driver media: sun6i-csi: Register the media device after creation media: sun6i-csi: Pass and store csi device directly in video code media: sun6i-csi: Tidy up video code media: sun6i-csi: Tidy up v4l2 code media: sun6i-csi: Tidy up Kconfig media: sun6i-csi: Use runtime pm for clocks and reset media: sun6i-csi: Define and use variant to get module clock rate media: sun6i-csi: Always set exclusive module clock rate media: sun6i-csi: Tidy up platform code media: sun6i-csi: Refactor main driver data structures media: sun6i-csi: Define and use driver name and (reworked) description media: cedrus: Add a Kconfig dependency on RESET_CONTROLLER media: sun8i-rotate: Add a Kconfig dependency on RESET_CONTROLLER media: sun8i-di: Add a Kconfig dependency on RESET_CONTROLLER media: sun4i-csi: Add a Kconfig dependency on RESET_CONTROLLER media: sun6i-csi: Add a Kconfig dependency on RESET_CONTROLLER ...
2022-10-22	io_uring/net: fail zc sendmsg when unsupported by socket	Pavel Begunkov
	The previous patch fails zerocopy send requests for protocols that don't support it, do the same for zerocopy sendmsg. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0854e7bb4c3d810a48ec8b5853e2f61af36a0467.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-22	io_uring/net: fail zc send when unsupported by socket	Pavel Begunkov
	If a protocol doesn't support zerocopy it will silently fall back to copying. This type of behaviour has always been a source of troubles so it's better to fail such requests instead. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2db3c7f16bb6efab4b04569cd16e6242b40c5cb3.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-22	net: flag sockets supporting msghdr originated zerocopy	Pavel Begunkov
	We need an efficient way in io_uring to check whether a socket supports zerocopy with msghdr provided ubuf_info. Add a new flag into the struct socket flags fields. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-22	hwmon: (corsair-psu) Add USB id of the new HX1500i psu	Wilken Gottwalt
	Also update the documentation accordingly. Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net> Link: https://lore.kernel.org/r/Y0FghqQCHG/cX5Jz@monster.localdomain Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-10-22	tools: include: sync include/api/linux/kvm.h	Paolo Bonzini
	Provide a definition of KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Fixes: 17601bfed909 ("KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option") Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-22	KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER	Alexander Graf
	The KVM_X86_SET_MSR_FILTER ioctls contains a pointer in the passed in struct which means it has a different struct size depending on whether it gets called from 32bit or 64bit code. This patch introduces compat code that converts from the 32bit struct to its 64bit counterpart which then gets used going forward internally. With this applied, 32bit QEMU can successfully set MSR bitmaps when running on 64bit kernels. Reported-by: Andrew Randrianasulu <randrianasulu@gmail.com> Fixes: 1a155254ff937 ("KVM: x86: Introduce MSR filtering") Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20221017184541.2658-4-graf@amazon.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-22	KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter()	Alexander Graf
	In the next patch we want to introduce a second caller to set_msr_filter() which constructs its own filter list on the stack. Refactor the original function so it takes it as argument instead of reading it through copy_from_user(). Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20221017184541.2658-3-graf@amazon.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-22	kvm: Add support for arch compat vm ioctls	Alexander Graf
	We will introduce the first architecture specific compat vm ioctl in the next patch. Add all necessary boilerplate to allow architectures to override compat vm ioctls when necessary. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20221017184541.2658-2-graf@amazon.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-22	Merge tag 'kvm-riscv-fixes-6.1-1' of https://github.com/kvm-riscv/linux into ↵	Paolo Bonzini
	HEAD KVM/riscv fixes for 6.1, take #1 - Fix compilation without RISCV_ISA_ZICBOM - Fix kvm_riscv_vcpu_timer_pending() for Sstc