summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-31powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into ↵Naveen N Rao
bpf_jit_emit_func_call_rel() Commit 61688a82e047 ("powerpc/bpf: enable kfunc call") enhanced bpf_jit_emit_func_call_hlp() to handle calls out to module region, where bpf progs are generated. The only difference now between bpf_jit_emit_func_call_hlp() and bpf_jit_emit_func_call_rel() is in handling of the initial pass where target function address is not known. Fold that logic into bpf_jit_emit_func_call_hlp() and rename it to bpf_jit_emit_func_call_rel() to simplify bpf function call JIT code. We don't actually need to load/restore TOC across a call out to a different kernel helper or to a different bpf program since they all work with the kernel TOC. We only need to do it if we have to call out to a module function. So, guard TOC load/restore with appropriate conditions. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-10-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Move ftrace stub used for init text before _einittextNaveen N Rao
Move the ftrace stub used to cover inittext before _einittext so that it is within kernel text, as seen through core_kernel_text(). This is required for a subsequent change to ftrace. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-9-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Skip instruction patching if the instructions are the sameNaveen N Rao
To simplify upcoming changes to ftrace, add a check to skip actual instruction patching if the old and new instructions are the same. We still validate that the instruction is what we expect, but don't actually patch the same instruction again. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-8-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftraceNaveen N Rao
Pointer to struct module is only relevant for ftrace records belonging to kernel modules. Having this field in dyn_arch_ftrace wastes memory for all ftrace records belonging to the kernel. Remove the same in favour of looking up the module from the ftrace record address, similar to other architectures. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-7-hbathini@linux.ibm.com
2024-10-31powerpc/module_64: Convert #ifdef to IS_ENABLED()Naveen N Rao
Minor refactor for converting #ifdef to IS_ENABLED(). Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-6-hbathini@linux.ibm.com
2024-10-31powerpc32/ftrace: Unify 32-bit and 64-bit ftrace entry codeNaveen N Rao
On 32-bit powerpc, gcc generates a three instruction sequence for function profiling: mflr r0 stw r0, 4(r1) bl _mcount On kernel boot, the call to _mcount() is nop-ed out, to be patched back in when ftrace is actually enabled. The 'stw' instruction therefore is not necessary unless ftrace is enabled. Nop it out during ftrace init. When ftrace is enabled, we want the 'stw' so that stack unwinding works properly. Perform the same within the ftrace handler, similar to 64-bit powerpc. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-5-hbathini@linux.ibm.com
2024-10-31powerpc64/ftrace: Nop out additional 'std' instruction emitted by gcc v5.xNaveen N Rao
Gcc v5.x emits a 3-instruction sequence for -mprofile-kernel: mflr r0 std r0, 16(r1) bl _mcount Gcc v6.x moved to a simpler 2-instruction sequence by removing the 'std' instruction. The store saved the return address in the LR save area in the caller stack frame for stack unwinding. However, with dynamic ftrace, we no longer have a call to _mcount on kernel boot when ftrace is not enabled. When ftrace is enabled, that store is performed within ftrace_caller(). As such, the additional 'std' instruction is redundant. Nop it out on kernel boot. With this change, we now use the same 2-instruction profiling sequence with both -mprofile-kernel, as well as -fpatchable-function-entry on 64-bit powerpc. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-4-hbathini@linux.ibm.com
2024-10-31powerpc/kprobes: Use ftrace to determine if a probe is at function entryNaveen N Rao
Rather than hard-coding the offset into a function to be used to determine if a kprobe is at function entry, use ftrace_location() to determine the ftrace location within the function and categorize all instructions till that offset to be function entry. For functions that cannot be traced, we fall back to using a fixed offset of 8 (two instructions) to categorize a probe as being at function entry for 64-bit elfv2, unless we are using pcrel. Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-3-hbathini@linux.ibm.com
2024-10-31powerpc/trace: Account for -fpatchable-function-entry support by toolchainNaveen N Rao
So far, we have relied on the fact that gcc supports both -mprofile-kernel, as well as -fpatchable-function-entry, and clang supports neither. Our Makefile only checks for CONFIG_MPROFILE_KERNEL to decide which files to build. Clang has a feature request out [*] to implement -fpatchable-function-entry, and is unlikely to support -mprofile-kernel. Update our Makefile checks so that we pick up the correct files to build once clang picks up support for -fpatchable-function-entry. [*] https://github.com/llvm/llvm-project/issues/57031 Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-2-hbathini@linux.ibm.com
2024-10-29powerpc/64: Remove maple platformMichael Ellerman
The maple platform was added in 2004 [1], to support the "Maple" 970FX evaluation board. It was later used for IBM JS20/JS21 machines, as well as the Bimini machine, aka "Yellow Dog Powerstation". Sadly all those machines have passed into memory, and there's been no evidence for years that anyone is still using any of them. Remove the platform and related code. It can always be reinstated if there's interest. Note that this has no impact on support for 970FX based Power Macs. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=f0d068d65c5e555ffcfbc189de32598f6f00770c Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241013102957.548291-1-mpe@ellerman.id.au
2024-10-29powerpc/boot: Remove bogus reference to liloMichael Ellerman
The help text refers to lilo, but the install script does not run lilo and never has. The reference to lilo seems to have come originally from arch/ppc/Makefile, but it was not true there either. Remove it. Reported-by: Thorsten Leemhuis <linux@leemhuis.info> Link: https://fosstodon.org/@kernellogger/113032940928131612 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241009053806.135807-1-mpe@ellerman.id.au
2024-10-29powerpc/pseries: Fix dtl_access_lock to be a rw_semaphoreMichael Ellerman
The dtl_access_lock needs to be a rw_sempahore, a sleeping lock, because the code calls kmalloc() while holding it, which can sleep: # echo 1 > /proc/powerpc/vcpudispatch_stats BUG: sleeping function called from invalid context at include/linux/sched/mm.h:337 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 199, name: sh preempt_count: 1, expected: 0 3 locks held by sh/199: #0: c00000000a0743f8 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x324/0x438 #1: c0000000028c7058 (dtl_enable_mutex){+.+.}-{3:3}, at: vcpudispatch_stats_write+0xd4/0x5f4 #2: c0000000028c70b8 (dtl_access_lock){+.+.}-{2:2}, at: vcpudispatch_stats_write+0x220/0x5f4 CPU: 0 PID: 199 Comm: sh Not tainted 6.10.0-rc4 #152 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries Call Trace: dump_stack_lvl+0x130/0x148 (unreliable) __might_resched+0x174/0x410 kmem_cache_alloc_noprof+0x340/0x3d0 alloc_dtl_buffers+0x124/0x1ac vcpudispatch_stats_write+0x2a8/0x5f4 proc_reg_write+0xf4/0x150 vfs_write+0xfc/0x438 ksys_write+0x88/0x148 system_call_exception+0x1c4/0x5a0 system_call_common+0xf4/0x258 Fixes: 06220d78f24a ("powerpc/pseries: Introduce rwlock to gatekeep DTLB usage") Tested-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Nysal Jan K.A <nysal@linux.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20240819122401.513203-1-mpe@ellerman.id.au
2024-10-29powerpc/machdep: Drop include of dma-mapping.hMichael Ellerman
Drop the include of dma-mapping.h in machdep.h, replace it with forward declarations of struct device and struct pci_dev, and include time64.h and page.h which are required for time64_t and pgprot_t respectively. Add direct includes of some other headers to some files that were getting them via machdep.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241009051826.132805-2-mpe@ellerman.id.au
2024-10-29powerpc/machdep: Drop include of seq_file.hMichael Ellerman
Drop the include of seq_file.h in machdep.h, replace it with a forward declaration of struct seq_file, which is all that's required. Add direct includes of seq_file.h to some files that were getting seq_file.h via machdep.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241009051826.132805-1-mpe@ellerman.id.au
2024-10-29powerpc/64: Drop IPI_PRIORITY from asm-offsetsMichael Ellerman
The last use of IPI_PRIORITY in asm was removed in commit 37f55d30df2e ("KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function"). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241009051701.132282-1-mpe@ellerman.id.au
2024-10-23powerpc: Adjust adding stack protector flags to KBUILD_CLAGS for clangNathan Chancellor
After fixing the HAVE_STACKPROTECTER checks for clang's in-progress per-task stack protector support [1], the build fails during prepare0 because '-mstack-protector-guard-offset' has not been added to KBUILD_CFLAGS yet but the other '-mstack-protector-guard' flags have. clang: error: '-mstack-protector-guard=tls' is used without '-mstack-protector-guard-offset', and there is no default clang: error: '-mstack-protector-guard=tls' is used without '-mstack-protector-guard-offset', and there is no default make[4]: *** [scripts/Makefile.build:229: scripts/mod/empty.o] Error 1 make[4]: *** [scripts/Makefile.build:102: scripts/mod/devicetable-offsets.s] Error 1 Mirror other architectures and add all '-mstack-protector-guard' flags to KBUILD_CFLAGS atomically during stack_protector_prepare, which resolves the issue and allows clang's implementation to fully work with the kernel. Cc: stable@vger.kernel.org # 6.1+ Link: https://github.com/llvm/llvm-project/pull/110928 [1] Reviewed-by: Keith Packard <keithp@keithp.com> Tested-by: Keith Packard <keithp@keithp.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241009-powerpc-fix-stackprotector-test-clang-v2-2-12fb86b31857@kernel.org
2024-10-23powerpc: Fix stack protector Kconfig test for clangNathan Chancellor
Clang's in-progress per-task stack protector support [1] does not work with the current Kconfig checks because '-mstack-protector-guard-offset' is not provided, unlike all other architecture Kconfig checks. $ fd Kconfig -x rg -l mstack-protector-guard-offset ./arch/arm/Kconfig ./arch/riscv/Kconfig ./arch/arm64/Kconfig This produces an error from clang, which is interpreted as the flags not being supported at all when they really are. $ clang --target=powerpc64-linux-gnu \ -mstack-protector-guard=tls \ -mstack-protector-guard-reg=r13 \ -c -o /dev/null -x c /dev/null clang: error: '-mstack-protector-guard=tls' is used without '-mstack-protector-guard-offset', and there is no default This argument will always be provided by the build system, so mirror other architectures and use '-mstack-protector-guard-offset=0' for testing support, which fixes the issue for clang and does not regress support with GCC. Even with the first problem addressed, the 32-bit test continues to fail because Kbuild uses the powerpc64le-linux-gnu target for clang and nothing flips the target to 32-bit, resulting in an error about an invalid register valid: $ clang --target=powerpc64le-linux-gnu \ -mstack-protector-guard=tls -mstack-protector-guard-reg=r2 \ -mstack-protector-guard-offset=0 \ -x c -c -o /dev/null /dev/null clang: error: invalid value 'r2' in 'mstack-protector-guard-reg=', expected one of: r13 While GCC allows arbitrary registers, the implementation of '-mstack-protector-guard=tls' in LLVM shares the same code path as the user space thread local storage implementation, which uses a fixed register (2 for 32-bit and 13 for 62-bit), so the command line parsing enforces this limitation. Use the Kconfig macro '$(m32-flag)', which expands to '-m32' when supported, in the stack protector support cc-option call to properly switch the target to a 32-bit one, which matches what happens in Kbuild. While the 64-bit macro does not strictly need it, add the equivalent 64-bit option for symmetry. Cc: stable@vger.kernel.org # 6.1+ Link: https://github.com/llvm/llvm-project/pull/110928 [1] Reviewed-by: Keith Packard <keithp@keithp.com> Tested-by: Keith Packard <keithp@keithp.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241009-powerpc-fix-stackprotector-test-clang-v2-1-12fb86b31857@kernel.org
2024-10-23book3s64/hash: Early detect debug_pagealloc size requirementRitesh Harjani (IBM)
Add hash_supports_debug_pagealloc() helper to detect whether debug_pagealloc can be supported on hash or not. This checks for both, whether debug_pagealloc config is enabled and the linear map should fit within rma_size/4 region size. This can then be used early during htab_init_page_sizes() to decide linear map pagesize if hash supports either debug_pagealloc or kfence. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/c33c6691b2a2cf619cc74ac100118ca4dbf21a48.1729271995.git.ritesh.list@gmail.com
2024-10-23book3s64/hash: Disable kfence if not early initRitesh Harjani (IBM)
Enable kfence on book3s64 hash only when early init is enabled. This is because, kfence could cause the kernel linear map to be mapped at PAGE_SIZE level instead of 16M (which I guess we don't want). Also currently there is no way to - 1. Make multiple page size entries for the SLB used for kernel linear map. 2. No easy way of getting the hash slot details after the page table mapping for kernel linear setup. So even if kfence allocate the pool in late init, we won't be able to get the hash slot details in kfence linear map. Thus this patch disables kfence on hash if kfence early init is not enabled. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/4a6eea8cfd1cd28fccfae067026bff30cbec1d4b.1729271995.git.ritesh.list@gmail.com
2024-10-23book3s64/radix: Refactoring common kfence related functionsRitesh Harjani (IBM)
Both radix and hash on book3s requires to detect if kfence early init is enabled or not. Hash needs to disable kfence if early init is not enabled because with kfence the linear map is mapped using PAGE_SIZE rather than 16M mapping. We don't support multiple page sizes for slb entry used for kernel linear map in book3s64. This patch refactors out the common functions required to detect kfence early init is enabled or not. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/f4a787224fbe5bb787158ace579780c0257f6602.1729271995.git.ritesh.list@gmail.com
2024-10-23book3s64/hash: Add kfence functionalityRitesh Harjani (IBM)
Now that linear map functionality of debug_pagealloc is made generic, enable kfence to use this generic infrastructure. 1. Define kfence related linear map variables. - u8 *linear_map_kf_hash_slots; - unsigned long linear_map_kf_hash_count; - DEFINE_RAW_SPINLOCK(linear_map_kf_hash_lock); 2. The linear map size allocated in RMA region is quite small (KFENCE_POOL_SIZE >> PAGE_SHIFT) which is 512 bytes by default. 3. kfence pool memory is reserved using memblock_phys_alloc() which has can come from anywhere. (default 255 objects => ((1+255) * 2) << PAGE_SHIFT = 32MB) 4. The hash slot information for kfence memory gets added in linear map in hash_linear_map_add_slot() (which also adds for debug_pagealloc). Reported-by: Pavithra Prakash <pavrampu@linux.vnet.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/5c2b61941b344077a2b8654dab46efa0322af3af.1729271995.git.ritesh.list@gmail.com
2024-10-23book3s64/hash: Disable debug_pagealloc if it requires more memoryRitesh Harjani (IBM)
Make size of the linear map to be allocated in RMA region to be of ppc64_rma_size / 4. If debug_pagealloc requires more memory than that then do not allocate any memory and disable debug_pagealloc. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/e1ef66f32a1fe63bcbb89d5c11d86c65beef5ded.1729271995.git.ritesh.list@gmail.com
2024-10-23book3s64/hash: Make kernel_map_linear_page() genericRitesh Harjani (IBM)
Currently kernel_map_linear_page() function assumes to be working on linear_map_hash_slots array. But since in later patches we need a separate linear map array for kfence, hence make kernel_map_linear_page() take a linear map array and lock in it's function argument. This is needed to separate out kfence from debug_pagealloc infrastructure. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/5b67df7b29e68d7c78d6fc1f42d41137299bac6b.1729271995.git.ritesh.list@gmail.com
2024-10-23book3s64/hash: Refactor hash__kernel_map_pages() functionRitesh Harjani (IBM)
This refactors hash__kernel_map_pages() function to call hash_debug_pagealloc_map_pages(). This will come useful when we will add kfence support. No functionality changes in this patch. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/0cb8ddcccdcf61ea06ab4d92aacd770c16cc0f2c.1729271995.git.ritesh.list@gmail.com
2024-10-23book3s64/hash: Add hash_debug_pagealloc_alloc_slots() functionRitesh Harjani (IBM)
This adds hash_debug_pagealloc_alloc_slots() function instead of open coding that in htab_initialize(). This is required since we will be separating the kfence functionality to not depend upon debug_pagealloc. Now that everything required for debug_pagealloc is under a #ifdef config. Bring in linear_map_hash_slots and linear_map_hash_count variables under the same config too. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/d1d5aabe1e4c693a983e59ccf3de08e3c28c5161.1729271995.git.ritesh.list@gmail.com
2024-10-23book3s64/hash: Add hash_debug_pagealloc_add_slot() functionRitesh Harjani (IBM)
This adds hash_debug_pagealloc_add_slot() function instead of open coding that in htab_bolt_mapping(). This is required since we will be separating kfence functionality to not depend upon debug_pagealloc. No functionality change in this patch. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/026f0aaa1dddd89154dc8d20ceccfca4f63ccf79.1729271995.git.ritesh.list@gmail.com
2024-10-23book3s64/hash: Refactor kernel linear map related callsRitesh Harjani (IBM)
This just brings all linear map related handling at one place instead of having those functions scattered in hash_utils file. Makes it easy for review. No functionality changes in this patch. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/56c610310aa50b5417976a39c5f15b78bc76c764.1729271995.git.ritesh.list@gmail.com
2024-10-23book3s64/hash: Remove kfence support temporarilyRitesh Harjani (IBM)
Kfence on book3s Hash on pseries is anyways broken. It fails to boot due to RMA size limitation. That is because, kfence with Hash uses debug_pagealloc infrastructure. debug_pagealloc allocates linear map for entire dram size instead of just kfence relevant objects. This means for 16TB of DRAM it will require (16TB >> PAGE_SHIFT) which is 256MB which is half of RMA region on P8. crash kernel reserves 256MB and we also need 2048 * 16KB * 3 for emergency stack and some more for paca allocations. That means there is not enough memory for reserving the full linear map in the RMA region, if the DRAM size is too big (>=16TB) (The issue is seen above 8TB with crash kernel 256 MB reservation). Now Kfence does not require linear memory map for entire DRAM. It only needs for kfence objects. So this patch temporarily removes the kfence functionality since debug_pagealloc code needs some refactoring. We will bring in kfence on Hash support in later patches. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/1761bc39674473c8878dedca15e0d9a0d3a1b528.1729271995.git.ritesh.list@gmail.com
2024-10-23powerpc/mm/fault: Fix kfence page fault reportingRitesh Harjani (IBM)
copy_from_kernel_nofault() can be called when doing read of /proc/kcore. /proc/kcore can have some unmapped kfence objects which when read via copy_from_kernel_nofault() can cause page faults. Since *_nofault() functions define their own fixup table for handling fault, use that instead of asking kfence to handle such faults. Hence we search the exception tables for the nip which generated the fault. If there is an entry then we let the fixup table handler handle the page fault by returning an error from within ___do_page_fault(). This can be easily triggered if someone tries to do dd from /proc/kcore. eg. dd if=/proc/kcore of=/dev/null bs=1M Some example false negatives: =============================== BUG: KFENCE: invalid read in copy_from_kernel_nofault+0x9c/0x1a0 Invalid read at 0xc0000000fdff0000: copy_from_kernel_nofault+0x9c/0x1a0 0xc00000000665f950 read_kcore_iter+0x57c/0xa04 proc_reg_read_iter+0xe4/0x16c vfs_read+0x320/0x3ec ksys_read+0x90/0x154 system_call_exception+0x120/0x310 system_call_vectored_common+0x15c/0x2ec BUG: KFENCE: use-after-free read in copy_from_kernel_nofault+0x9c/0x1a0 Use-after-free read at 0xc0000000fe050000 (in kfence-#2): copy_from_kernel_nofault+0x9c/0x1a0 0xc00000000665f950 read_kcore_iter+0x57c/0xa04 proc_reg_read_iter+0xe4/0x16c vfs_read+0x320/0x3ec ksys_read+0x90/0x154 system_call_exception+0x120/0x310 system_call_vectored_common+0x15c/0x2ec Fixes: 90cbac0e995d ("powerpc: Enable KFENCE for PPC32") Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reported-by: Disha Goel <disgoel@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/a411788081d50e3b136c6270471e35aba3dfafa3.1729271995.git.ritesh.list@gmail.com
2024-10-21powerpc/fadump: Move fadump_cma_init to setup_arch() after initmem_init()Ritesh Harjani (IBM)
During early init CMA_MIN_ALIGNMENT_BYTES can be PAGE_SIZE, since pageblock_order is still zero and it gets initialized later during initmem_init() e.g. setup_arch() -> initmem_init() -> sparse_init() -> set_pageblock_order() One such use case where this causes issue is - early_setup() -> early_init_devtree() -> fadump_reserve_mem() -> fadump_cma_init() This causes CMA memory alignment check to be bypassed in cma_init_reserved_mem(). Then later cma_activate_area() can hit a VM_BUG_ON_PAGE(pfn & ((1 << order) - 1)) if the reserved memory area was not pageblock_order aligned. Fix it by moving the fadump_cma_init() after initmem_init(), where other such cma reservations also gets called. <stack trace> ============== page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10010 flags: 0x13ffff800000000(node=1|zone=0|lastcpupid=0x7ffff) CMA raw: 013ffff800000000 5deadbeef0000100 5deadbeef0000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(pfn & ((1 << order) - 1)) ------------[ cut here ]------------ kernel BUG at mm/page_alloc.c:778! Call Trace: __free_one_page+0x57c/0x7b0 (unreliable) free_pcppages_bulk+0x1a8/0x2c8 free_unref_page_commit+0x3d4/0x4e4 free_unref_page+0x458/0x6d0 init_cma_reserved_pageblock+0x114/0x198 cma_init_reserved_areas+0x270/0x3e0 do_one_initcall+0x80/0x2f8 kernel_init_freeable+0x33c/0x530 kernel_init+0x34/0x26c ret_from_kernel_user_thread+0x14/0x1c Fixes: 11ac3e87ce09 ("mm: cma: use pageblock_order as the single alignment") Suggested-by: David Hildenbrand <david@redhat.com> Reported-by: Sachin P Bappalige <sachinpb@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/3ae208e48c0d9cefe53d2dc4f593388067405b7d.1729146153.git.ritesh.list@gmail.com
2024-10-21powerpc/fadump: Reserve page-aligned boot_memory_size during fadump_reserve_memRitesh Harjani (IBM)
This patch refactors all CMA related initialization and alignment code to within fadump_cma_init() which gets called in the end. This also means that we keep [reserve_dump_area_start, boot_memory_size] page aligned during fadump_reserve_mem(). Then later in fadump_cma_init() we extract the aligned chunk and provide it to CMA. This inherently also fixes an issue in the current code where the reserve_dump_area_start is not aligned when the physical memory can have holes and the suitable chunk starts at an unaligned boundary. After this we should be able to call fadump_cma_init() independently later in setup_arch() where pageblock_order is non-zero. Suggested-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/805d6b900968fb9402ad8f4e4775597db42085c4.1729146153.git.ritesh.list@gmail.com
2024-10-21powerpc/fadump: Refactor and prepare fadump_cma_init for late initRitesh Harjani (IBM)
We anyway don't use any return values from fadump_cma_init(). Since fadump_reserve_mem() from where fadump_cma_init() gets called today, already has the required checks. This patch makes this function return type as void. Let's also handle extra cases like return if fadump_supported is false or dump_active, so that in later patches we can call fadump_cma_init() separately from setup_arch(). Acked-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/a2afc3d6481a87a305e89cfc4a3f3d2a0b8ceab3.1729146153.git.ritesh.list@gmail.com
2024-10-16Merge branch 'topic/vdso' into nextMichael Ellerman
Merge VDSO changes we're keeping in a topic branch, in case they conflict with other VDSO changes in flight.
2024-10-16powerpc/vdso: Flag VDSO64 entry points as functionsChristophe Leroy
On powerpc64 as shown below by readelf, vDSO functions symbols have type NOTYPE. $ powerpc64-linux-gnu-readelf -a arch/powerpc/kernel/vdso/vdso64.so.dbg ELF Header: Magic: 7f 45 4c 46 02 02 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, big endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Shared object file) Machine: PowerPC64 Version: 0x1 ... Symbol table '.dynsym' contains 12 entries: Num: Value Size Type Bind Vis Ndx Name ... 1: 0000000000000524 84 NOTYPE GLOBAL DEFAULT 8 __[...]@@LINUX_2.6.15 ... 4: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.6.15 5: 00000000000006c0 48 NOTYPE GLOBAL DEFAULT 8 __[...]@@LINUX_2.6.15 Symbol table '.symtab' contains 56 entries: Num: Value Size Type Bind Vis Ndx Name ... 45: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.6.15 46: 00000000000006c0 48 NOTYPE GLOBAL DEFAULT 8 __kernel_getcpu 47: 0000000000000524 84 NOTYPE GLOBAL DEFAULT 8 __kernel_clock_getres To overcome that, commit ba83b3239e65 ("selftests: vDSO: fix vDSO symbols lookup for powerpc64") was applied to have selftests also look for NOTYPE symbols, but the correct fix should be to flag VDSO entry points as functions. The original commit that brought VDSO support into powerpc/64 has the following explanation: Note that the symbols exposed by the vDSO aren't "normal" function symbols, apps can't be expected to link against them directly, the vDSO's are both seen as if they were linked at 0 and the symbols just contain offsets to the various functions. This is done on purpose to avoid a relocation step (ppc64 functions normally have descriptors with abs addresses in them). When glibc uses those functions, it's expected to use it's own trampolines that know how to reach them. The descriptors it's talking about are the OPD function descriptors used on ABI v1 (big endian). But it would be more correct for a text symbol to have type function, even if there's no function descriptor for it. glibc has a special case already for handling the VDSO symbols which creates a fake opd pointing at the kernel symbol. So changing the VDSO symbol type to function shouldn't affect that. For ABI v2, there is no function descriptors and VDSO functions can safely have function type. So lets flag VDSO entry points as functions and revert the selftest change. Link: https://github.com/mpe/linux-fullhistory/commit/5f2dd691b62da9d9cc54b938f8b29c22c93cb805 Fixes: ba83b3239e65 ("selftests: vDSO: fix vDSO symbols lookup for powerpc64") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-By: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/b6ad2f1ee9887af3ca5ecade2a56f4acda517a85.1728512263.git.christophe.leroy@csgroup.eu
2024-10-16powerpc/vdso: Implement __arch_get_vdso_rng_data()Christophe Leroy
VDSO time functions do not call any other function, so they don't need to save/restore LR. However, retrieving the address of VDSO data page requires using LR hence saving then restoring it, which can be heavy on some CPUs. On the other hand, VDSO functions on powerpc are not standard functions and require a wrapper function to call C VDSO functions. And that wrapper has to save and restore LR in order to call the C VDSO function, so retrieving VDSO data page address in that wrapper doesn't require additional save/restore of LR. For random VDSO functions it is a bit different. Because the function calls __arch_chacha20_blocks_nostack(), it saves and restores LR. Retrieving VDSO data page address can then be done there without additional save/restore of LR. So lets implement __arch_get_vdso_rng_data() and simplify the wrapper. It starts paving the way for the day powerpc will implement a more standard ABI for VDSO functions. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/a1a9bd0df508f1b5c04684b7366940577dfc6262.1727858295.git.christophe.leroy@csgroup.eu
2024-10-16powerpc/vdso: Add a page for non-time dataChristophe Leroy
The page containing VDSO time data is swapped with the one containing TIME namespace data when a process uses a non-root time namespace. For other data like powerpc specific data and RNG data, it means tracking whether time namespace is the root one or not to know which page to use. Simplify the logic behind by moving time data out of first data page so that the first data page which contains everything else always remains the first page. Time data is in the second or third page depending on selected time namespace. While we are playing with get_datapage macro, directly take into account the data offset inside the macro instead of adding that offset afterwards. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/0557d3ec898c1d0ea2fc59fa8757618e524c5d94.1727858295.git.christophe.leroy@csgroup.eu
2024-10-06Linux 6.12-rc2v6.12-rc2Linus Torvalds
2024-10-06Merge tag 'kbuild-fixes-v6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Move non-boot built-in DTBs to the .rodata section - Fix Kconfig bugs - Fix maint scripts in the linux-image Debian package - Import some list macros to scripts/include/ * tag 'kbuild-fixes-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: deb-pkg: Remove blank first line from maint scripts kbuild: fix a typo dt_binding_schema -> dt_binding_schemas scripts: import more list macros kconfig: qconf: fix buffer overflow in debug links kconfig: qconf: move conf_read() before drawing tree pain kconfig: clear expr::val_is_valid when allocated kconfig: fix infinite loop in sym_calc_choice() kbuild: move non-boot built-in DTBs to .rodata section
2024-10-06Merge tag 'platform-drivers-x86-v6.12-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: - Intel PMC fix for suspend/resume issues on some Sky and Kaby Lake laptops - Intel Diamond Rapids hw-id additions - Documentation and MAINTAINERS fixes - Some other small fixes * tag 'platform-drivers-x86-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: x86-android-tablets: Fix use after free on platform_device_register() errors platform/x86: wmi: Update WMI driver API documentation platform/x86: dell-ddv: Fix typo in documentation platform/x86: dell-sysman: add support for alienware products platform/x86/intel: power-domains: Add Diamond Rapids support platform/x86: ISST: Add Diamond Rapids to support list platform/x86:intel/pmc: Disable ACPI PM Timer disabling on Sky and Kaby Lake platform/x86: dell-laptop: Do not fail when encountering unsupported batteries MAINTAINERS: Update Intel In Field Scan(IFS) entry platform/x86: ISST: Fix the KASAN report slab-out-of-bounds bug
2024-10-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix pKVM error path on init, making sure we do not change critical system registers as we're about to fail - Make sure that the host's vector length is at capped by a value common to all CPUs - Fix kvm_has_feat*() handling of "negative" features, as the current code is pretty broken - Promote Joey to the status of official reviewer, while James steps down -- hopefully only temporarly x86: - Fix compilation with KVM_INTEL=KVM_AMD=n - Fix disabling KVM_X86_QUIRK_SLOT_ZAP_ALL when shadow MMU is in use Selftests: - Fix compilation on non-x86 architectures" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: x86/reboot: emergency callbacks are now registered by common KVM code KVM: x86: leave kvm.ko out of the build if no vendor module is requested KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMU KVM: arm64: Fix kvm_has_feat*() handling of negative features KVM: selftests: Fix build on architectures other than x86_64 KVM: arm64: Another reviewer reshuffle KVM: arm64: Constrain the host to the maximum shared SVE VL with pKVM KVM: arm64: Fix __pkvm_init_vcpu cptr_el2 error path
2024-10-06Merge tag 'powerpc-6.12-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: - Allow r30 to be used in vDSO code generation of getrandom Thanks to Jason A. Donenfeld * tag 'powerpc-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/vdso: allow r30 in vDSO code generation of getrandom
2024-10-07kbuild: deb-pkg: Remove blank first line from maint scriptsAaron Thompson
The blank line causes execve() to fail: # strace ./postinst execve("./postinst", ...) = -1 ENOEXEC (Exec format error) strace: exec: Exec format error +++ exited with 1 +++ However running the scripts via shell does work (at least with bash) because the shell attempts to execute the file as a shell script when execve() fails. Fixes: b611daae5efc ("kbuild: deb-pkg: split image and debug objects staging out into functions") Signed-off-by: Aaron Thompson <dev@aaront.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-10-07kbuild: fix a typo dt_binding_schema -> dt_binding_schemasXu Yang
If we follow "make help" to "make dt_binding_schema", we will see below error: $ make dt_binding_schema make[1]: *** No rule to make target 'dt_binding_schema'. Stop. make: *** [Makefile:224: __sub-make] Error 2 It should be a typo. So this will fix it. Fixes: 604a57ba9781 ("dt-bindings: kbuild: Add separate target/dependency for processed-schema.json") Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Reviewed-by: Nicolas Schier <n.schier@avm.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-10-07scripts: import more list macrosSami Tolvanen
Import list_is_first, list_is_last, list_replace, and list_replace_init. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-10-06platform/x86: x86-android-tablets: Fix use after free on ↵Hans de Goede
platform_device_register() errors x86_android_tablet_remove() frees the pdevs[] array, so it should not be used after calling x86_android_tablet_remove(). When platform_device_register() fails, store the pdevs[x] PTR_ERR() value into the local ret variable before calling x86_android_tablet_remove() to avoid using pdevs[] after it has been freed. Fixes: 5eba0141206e ("platform/x86: x86-android-tablets: Add support for instantiating platform-devs") Fixes: e2200d3f26da ("platform/x86: x86-android-tablets: Add gpio_keys support to x86_android_tablet_init()") Cc: stable@vger.kernel.org Reported-by: Aleksandr Burakov <a.burakov@rosalinux.ru> Closes: https://lore.kernel.org/platform-driver-x86/20240917120458.7300-1-a.burakov@rosalinux.ru/ Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20241005130545.64136-1-hdegoede@redhat.com
2024-10-06platform/x86: wmi: Update WMI driver API documentationArmin Wolf
The WMI driver core now passes the WMI event data to legacy notify handlers, so WMI devices sharing notification IDs are now being handled properly. Fixes: e04e2b760ddb ("platform/x86: wmi: Pass event data directly to legacy notify handlers") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20241005213825.701887-1-W_Armin@gmx.de Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06platform/x86: dell-ddv: Fix typo in documentationAnaswara T Rajan
Fix typo in word 'diagnostics' in documentation. Signed-off-by: Anaswara T Rajan <anaswaratrajan@gmail.com> Reviewed-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20241005070056.16326-1-anaswaratrajan@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06platform/x86: dell-sysman: add support for alienware productsCrag Wang
Alienware supports firmware-attributes and has its own OEM string. Signed-off-by: Crag Wang <crag_wang@dell.com> Link: https://lore.kernel.org/r/20241004152826.93992-1-crag_wang@dell.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06platform/x86/intel: power-domains: Add Diamond Rapids supportSrinivas Pandruvada
Add Diamond Rapids (INTEL_PANTHERCOVE_X) to tpmi_cpu_ids to support domaid id mappings. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20241003215554.3013807-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06platform/x86: ISST: Add Diamond Rapids to support listSrinivas Pandruvada
Add Diamond Rapids (INTEL_PANTHERCOVE_X) to SST support list by adding to isst_cpu_ids. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20241003215554.3013807-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>