linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-05-06	powerpc/64/bpf: fix tail calls for PCREL addressing	Hari Bathini
	With PCREL addressing, there is no kernel TOC. So, it is not setup in prologue when PCREL addressing is used. But the number of instructions to skip on a tail call was not adjusted accordingly. That resulted in not so obvious failures while using tailcalls. 'tailcalls' selftest crashed the system with the below call trace: bpf_test_run+0xe8/0x3cc (unreliable) bpf_prog_test_run_skb+0x348/0x778 __sys_bpf+0xb04/0x2b00 sys_bpf+0x28/0x38 system_call_exception+0x168/0x340 system_call_vectored_common+0x15c/0x2ec Also, as bpf programs are always module addresses and a bpf helper in general is a core kernel text address, using PC relative addressing often fails with "out of range of pcrel address" error. Switch to using kernel base for relative addressing to handle this better. Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL addresing") Cc: stable@vger.kernel.org # v6.4+ Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240502173205.142794-1-hbathini@linux.ibm.com
2024-05-06	Documentation: Fix the address of the linuxppc-dev mailing list	Stephen Rothwell
	This list was moved many years ago. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240503121012.3ba5000b@canb.auug.org.au
2024-05-06	Documentation: Document PowerPC kernel dynamic DEXCR interface	Benjamin Gray
	Documents how to use the PR_PPC_GET_DEXCR and PR_PPC_SET_DEXCR prctl()'s for changing a process's DEXCR or its process tree default value. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-10-bgray@linux.ibm.com
2024-05-06	selftests/powerpc/dexcr: Add chdexcr utility	Benjamin Gray
	Adds a utility to exercise the prctl DEXCR inheritance in the shell. Supports setting and clearing each aspect. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> [mpe: Use correct SPDX license, use execvp() for usability, print errors] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-9-bgray@linux.ibm.com
2024-05-06	selftests/powerpc/dexcr: Add DEXCR config details to lsdexcr	Benjamin Gray
	Now that the DEXCR can be configured with prctl, add a section in lsdexcr that explains why each aspect is set the way it is. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-8-bgray@linux.ibm.com
2024-05-06	selftests/powerpc/dexcr: Attempt to enable NPHIE in hashchk selftest	Benjamin Gray
	Now that a process can control its DEXCR to some extent, make the hashchk tests more reliable by explicitly setting the local and onexec NPHIE aspect. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-7-bgray@linux.ibm.com
2024-05-06	selftests/powerpc/dexcr: Add DEXCR prctl interface test	Benjamin Gray
	Some basic tests of the prctl interface of the DEXCR. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> [mpe: Add missing SPDX tag] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-6-bgray@linux.ibm.com
2024-05-06	powerpc/dexcr: Add DEXCR prctl interface	Benjamin Gray
	Now that we track a DEXCR on a per-task basis, individual tasks are free to configure it as they like. The interface is a pair of getter/setter prctl's that work on a single aspect at a time (multiple aspects at once is more difficult if there are different rules applied for each aspect, now or in future). The getter shows the current state of the process config, and the setter allows setting/clearing the aspect. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> [mpe: Account for PR_RISCV_SET_ICACHE_FLUSH_CTX, shrink some longs lines] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-5-bgray@linux.ibm.com
2024-05-03	powerpc/dexcr: Reset DEXCR value across exec	Benjamin Gray
	Inheriting the DEXCR across exec can have security and usability concerns. If a program is compiled with hash instructions it generally expects to run with NPHIE enabled. But if the parent process disables NPHIE then if it's not careful it will be disabled for any children too and the protection offered by hash checks is basically worthless. This patch introduces a per-process reset value that new execs in a particular process tree are initialized with. This enables fine grained control over what DEXCR value child processes run with by default. For example, containers running legacy binaries that expect hash instructions to act as NOPs could configure the reset value of the container root to control the default reset value for all members of the container. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> [mpe: Add missing SPDX tag on dexcr.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-4-bgray@linux.ibm.com
2024-05-03	powerpc/dexcr: Track the DEXCR per-process	Benjamin Gray
	Add capability to make the DEXCR act as a per-process SPR. We do not yet have an interface for changing the values per task. We also expect the kernel to use a single DEXCR value across all tasks while in privileged state, so there is no need to synchronize after changing it (the userspace aspects will synchronize upon returning to userspace). Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-3-bgray@linux.ibm.com
2024-05-03	selftests/powerpc/dexcr: Add -no-pie to hashchk tests	Benjamin Gray
	The hashchk tests want to verify that the hash key is changed over exec. It does so by calculating hashes at the same address across an exec. This is made simpler by disabling PIE functionality, so we can re-execute ourselves and be using the same addresses in the child. While -fno-pie is already added, -no-pie is also required. Fixes: bdb07f35a52f ("selftests/powerpc/dexcr: Add hashst/hashchk test") Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-2-bgray@linux.ibm.com
2024-05-03	powerpc/module: Remove arch specific module bug stuff	Dr. David Alan Gilbert
	The last function to reference module_bug_list went in 2008's commit b9754568ef17 ("powerpc: Remove dead module_find_bug code") but I don't think that was called since 2006's commit 73c9ceab40b1 ("[POWERPC] Generic BUG for powerpc") Now that the list has gone, I think we can also clean up the bug entries in mod_arch_specific. Lightly boot tested. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240503002317.183500-1-linux@treblig.org
2024-05-01	MAINTAINERS: MMU GATHER: Update Aneesh's address	Michael Ellerman
	Aneesh's IBM address no longer works, switch to his preferred kernel.org address. Acked-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240430044327.49363-1-mpe@ellerman.id.au
2024-05-01	MAINTAINERS: powerpc: Remove Aneesh	Michael Ellerman
	Aneesh is stepping down from powerpc maintenance. Acked-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240430044228.49015-1-mpe@ellerman.id.au
2024-04-30	powerpc: Mark memory_limit as initdata	Michael Ellerman
	The `memory_limit` variable should only be used during boot, enforce that by marking it initdata. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422115231.1769984-1-mpe@ellerman.id.au
2024-04-30	macintosh/macio-adb: replace of_node_put() with __free	sundar
	use the new cleanup magic to replace of_node_put() with __free(device_node) marking to auto release when they get out of scope. Suggested-by: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: sundar <prosunofficial@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240424150718.5006-1-prosunofficial@gmail.com
2024-04-29	selftests/powerpc: Install tests in sub-directories	Michael Ellerman
	The sources for the powerpc selftests are arranged into sub-directories. However when the tests are built and installed, the sub-directories are squashed, losing the structure. For example, with the current code the result of installing the selftests is: $ tree tools/testing/selftests/kselftest_install tools/testing/selftests/kselftest_install ├── kselftest │ ├── ktap_helpers.sh │ ├── module.sh │ ├── prefix.pl │ └── runner.sh ├── kselftest-list.txt ├── powerpc │ ├── alignment_handler │ ├── attr_test │ ├── back_to_back_ebbs_test │ ├── bad_accesses │ ├── bhrb_filter_map_test │ ├── bhrb_no_crash_wo_pmu_test │ ├── blacklisted_events_test │ ├── cache_shape │ ├── close_clears_pmcc_test │ ├── context_switch │ ├── copy_first_unaligned ... │ ├── settings ... │ └── wild_bctr └── run_kselftest.sh All the powerpc tests are squashed into the single powerpc directory. In particular, note that there is a single `settings` file, even though there are multiple settings files in the powerpc selftest sources. One of the settings files ends up installed, depending on install order, even if they have different contents. Similarly if there were two tests with the same name in different sub-directories they would clobber each other. Fix it by replicating the directory structure of the source tree into the install directory. The result being for example: $ tree tools/testing/selftests/kselftest_install tools/testing/selftests/kselftest_install ├── kselftest │ ├── ktap_helpers.sh │ ├── module.sh │ ├── prefix.pl │ └── runner.sh ├── kselftest-list.txt ├── powerpc │ ├── alignment │ │ ├── alignment_handler │ │ └── copy_first_unaligned │ ├── benchmarks │ │ ├── context_switch │ │ ├── exec_target │ │ ├── fork │ │ ├── futex_bench │ │ ├── gettimeofday │ │ ├── mmap_bench │ │ ├── null_syscall │ │ └── settings ... │ ├── eeh │ │ ├── eeh-basic.sh │ │ ├── eeh-functions.sh │ │ └── settings ... │ └── vphn │ └── test-vphn └── run_kselftest.sh Note multiple settings files in different sub-directories. This change also has the effect of changing the names of the tests from the point of view of the kselftest runner. Before the tests are named eg: powerpc:copy_first_unaligned powerpc:cache_shape powerpc:reg_access_test After, the test collection names include the sub-directory: powerpc/alignment:copy_first_unaligned powerpc/cache_shape:cache_shape powerpc/pmu/ebb:reg_access_test That means whereas previously all powerpc tests could be run with: $ ./run_kselftest.sh -c powerpc After the change it's necessary to pass a regex that matches all powerpc entries, eg: $ ./run_kselftest.sh -c "powerpc.*" The latter form also works before and after the change. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422133453.1793988-2-mpe@ellerman.id.au
2024-04-29	selftests/powerpc: Convert pmu Makefile to for loop style	Michael Ellerman
	The pmu Makefile has grown more sub directories over the years. Rather than open coding the rules for each subdir, use for loops. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422133453.1793988-1-mpe@ellerman.id.au
2024-04-29	selftests/powerpc: make sub-folders buildable on their own	Madhavan Srinivasan
	Build breaks when executing make with run_tests for sub-folders under powerpc. This is because, CFLAGS and GIT_VERSION macros are defined in Makefile of toplevel powerpc folder. make: Entering directory '/home/maddy/linux/tools/testing/selftests/powerpc/mm' gcc hugetlb_vs_thp_test.c ../harness.c ../utils.c -o /home/maddy/selftest_output//hugetlb_vs_thp_test hugetlb_vs_thp_test.c:6:10: fatal error: utils.h: No such file or directory 6 \| #include "utils.h" \| ^~~~~~~~~ compilation terminated. Fix this by adding the flags.mk in each sub-folder Makefile. Also remove the CFLAGS and GIT_VERSION macros from powerpc/ folder Makefile since the same is definied in flags.mk Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240229093711.581230-3-maddy@linux.ibm.com
2024-04-29	selftests/powerpc: Add flags.mk to support pmu buildable	Madhavan Srinivasan
	When running `make -C powerpc/pmu run_tests` from top level selftests directory, currently this error is being reported: make: Entering directory '/home/maddy/linux/tools/testing/selftests/powerpc/pmu' Makefile:40: warning: overriding recipe for target 'emit_tests' ../../lib.mk:111: warning: ignoring old recipe for target 'emit_tests' gcc -m64 count_instructions.c ../harness.c event.c lib.c ../utils.c loop.S -o /home/maddy/selftest_output//count_instructions In file included from count_instructions.c:13: event.h:12:10: fatal error: utils.h: No such file or directory 12 \| #include "utils.h" \| ^~~~~~~~~ compilation terminated. This is due to missing of include path in CFLAGS. That is, CFLAGS and GIT_VERSION macros are defined in the powerpc/ folder Makefile which in this case is not involved. To address the failure in case of executing specific sub-folder test directly, a new rule file has been addded by the patch called "flags.mk" under selftest/powerpc/ folder and is linked to all the Makefile of powerpc/pmu sub-folders. Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> [mpe: Fixup ifeq, make GIT_VERSION simply expanded to avoid re-executing git describe] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240229093711.581230-2-maddy@linux.ibm.com
2024-04-29	selftests/powerpc: Re-order *FLAGS to follow lib.mk	Madhavan Srinivasan
	In some powerpc/ sub-folder Makefiles, CFLAGS are defined before lib.mk include. Clean it up by re-ordering the flags to follow after the mk include. This is needed to support sub-folders in powerpc/ buildable on its own. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240229093711.581230-1-maddy@linux.ibm.com
2024-04-29	powerpc/pseries/vio: Don't return ENODEV if node or compatible missing	Lidong Zhong
	We noticed the following nuisance messages during boot process: vio vio: uevent: failed to send synthetic uevent vio 4000: uevent: failed to send synthetic uevent vio 4001: uevent: failed to send synthetic uevent vio 4002: uevent: failedto send synthetic uevent vio 4004: uevent: failed to send synthetic uevent It's caused by either vio_register_device_node() failing to set dev->of_node or the node is missing a "compatible" property. To match the definition of modalias in modalias_show(), remove the return of ENODEV in such cases. The failure messages is also suppressed with this change. Signed-off-by: Lidong Zhong <lidong.zhong@suse.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240411020450.12725-1-lidong.zhong@suse.com
2024-04-29	powerpc/pseries: Enforce hcall result buffer validity and size	Nathan Lynch
	plpar_hcall(), plpar_hcall9(), and related functions expect callers to provide valid result buffers of certain minimum size. Currently this is communicated only through comments in the code and the compiler has no idea. For example, if I write a bug like this: long retbuf[PLPAR_HCALL_BUFSIZE]; // should be PLPAR_HCALL9_BUFSIZE plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf, ...); This compiles with no diagnostics emitted, but likely results in stack corruption at runtime when plpar_hcall9() stores results past the end of the array. (To be clear this is a contrived example and I have not found a real instance yet.) To make this class of error less likely, we can use explicitly-sized array parameters instead of pointers in the declarations for the hcall APIs. When compiled with -Warray-bounds[1], the code above now provokes a diagnostic like this: error: array argument is too small; is of size 32, callee requires at least 72 [-Werror,-Warray-bounds] 60 \| plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf, \| ^ ~~~~~~ [1] Enabled for LLVM builds but not GCC for now. See commit 0da6e5fd6c37 ("gcc: disable '-Warray-bounds' for gcc-13 too") and related changes. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240408-pseries-hvcall-retbuf-v1-1-ebc73d7253cf@linux.ibm.com
2024-04-29	powerpc/dart: Drop unnecessary call to kmemleak_no_scan()	Michael Ellerman
	Erhard reported that kmemleak was showing a warning at boot: kmemleak: Not scanning unknown object at 0xc00000007f000000 CPU: 0 PID: 0 Comm: swapper Not tainted 5.19.0-rc3-PMacG5+ #2 Call Trace: .dump_stack_lvl+0x7c/0xc4 (unreliable) .kmemleak_no_scan+0xe0/0x100 .iommu_init_early_dart+0x2f0/0x924 .pmac_probe+0x1b0/0x20c .setup_arch+0x1b8/0x674 .start_kernel+0xdc/0xb74 start_here_common+0x1c/0x44 DART table allocated at: (____ptrval____) Which he bisected to a change in kmemleak, commit 23c2d497de21 ("mm: kmemleak: take a full lowmem check in kmemleak_*_phys()"). Because pmac_probe() is called before mem_topology_setup(), the min/ max PFN variables are still zero. That causes kmemleak_alloc_phys() to ignore the allocation, because the checks against the PFN fail. Then kmemleak_no_scan() can't find the allocation and prints warning. Given that kmemleak_alloc_phys() is ignoring the allocation to begin with, there's no need to call kmemleak_no_scan() at all, which avoids the warning. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://lore.kernel.org/all/bug-216156-206035@https.bugzilla.kernel.org%2F/ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240419115913.3317575-1-mpe@ellerman.id.au
2024-04-29	powerpc/eeh: Permanently disable the removed device	Ganesh Goudar
	When a device is hot removed on powernv, the hotplug driver clears the device's state. However, on pseries, if a device is removed by phyp after reaching the error threshold, the kernel remains unaware, leading to the device not being torn down. This prevents necessary remediation actions like failover. Permanently disable the device if the presence check fails. Also, in eeh_dev_check_failure in we may consider the error as false positive if the device is hotpluged out as the get_state call returns EEH_STATE_NOT_SUPPORT and we may end up not clearing the device state, so log the event if the state is not moved to permanent failure state. Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422075737.1405551-1-ganeshgr@linux.ibm.com
2024-04-29	Documentation/powerpc: update fadump implementation details	Sourabh Jain
	The patch titled ("powerpc: make fadump resilient with memory add/remove events") has made significant changes to the implementation of fadump, particularly on elfcorehdr creation and fadump crash info header structure. Therefore, updating the fadump implementation documentation to reflect those changes. Following updates are done to firmware assisted dump documentation: 1. The elfcorehdr is no longer stored after fadump HDR in the reserved dump area. Instead, the second kernel dynamically allocates memory for the elfcorehdr within the address range from 0 to the boot memory size. Therefore, update figures 1 and 2 of Memory Reservation during the first and second kernels to reflect this change. 2. A version field has been added to the fadump header to manage the future changes to fadump crash info header structure without changing the fadump header magic number in the future. Therefore, remove the corresponding TODO from the document. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422195932.1583833-4-sourabhjain@linux.ibm.com
2024-04-29	powerpc/fadump: add hotplug_ready sysfs interface	Sourabh Jain
	The elfcorehdr describes the CPUs and memory of the crashed kernel to the kernel that captures the dump, known as the second or fadump kernel. The elfcorehdr needs to be updated if the system's memory changes due to memory hotplug or online/offline events. Currently, memory hotplug events are monitored in userspace by udev rules, and fadump is re-registered, which recreates the elfcorehdr with the latest available memory in the system. However, the previous patch ("powerpc: make fadump resilient with memory add/remove events") moved the creation of elfcorehdr to the second or fadump kernel. This eliminates the need to regenerate the elfcorehdr during memory hotplug or online/offline events. Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let userspace know that fadump re-registration is not required for memory add/remove events. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422195932.1583833-3-sourabhjain@linux.ibm.com
2024-04-29	powerpc: make fadump resilient with memory add/remove events	Sourabh Jain
	Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel prepares fadump header and stores it in the fadump reserved area. The fadump header includes the start address of the elfcorehdr, crashing CPU details, and other relevant information. In the event of a crash in the first kernel, the second/fadump boots and accesses the fadump header prepared by the first kernel. It then performs the following steps in a platform-specific function [rtas\|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr and set its address to the global variable elfcorehdr_addr for the vmcore module to process it in the second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and its size to fadump kernel. In addition to the vmcoreinfo address and size, there are a few other attributes also added to the fadump_crash_info_header structure. 1. version: It stores the fadump header version, which is currently set to 1. This provides flexibility to update the fadump crash info header in the future without changing the magic number. For each change in the fadump header, the version will be increased. This will help the updated kernel determine how to handle kernel dumps from older kernels. The magic number remains relevant for checking fadump header corruption. 2. pt_regs_sz/cpu_mask_sz: Store size of pt_regs and cpu_mask structure of first kernel. These attributes are used to prevent dump processing if the sizes of pt_regs or cpu_mask structure differ between the first and fadump kernels. Note: if either first/crashed kernel or second/fadump kernel do not have the changes introduced here then kernel fail to collect the dump and prints relevant error message on the console. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422195932.1583833-2-sourabhjain@linux.ibm.com
2024-04-29	powerpc/pseries: Add failure related checks for h_get_mpp and h_get_ppp	Shrikanth Hegde
	Couple of Minor fixes: - hcall return values are long. Fix that for h_get_mpp, h_get_ppp and parse_ppp_data - If hcall fails, values set should be at-least zero. It shouldn't be uninitialized values. Fix that for h_get_mpp and h_get_ppp Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240412092047.455483-3-sshegde@linux.ibm.com
2024-04-29	powerpc/pseries: Add pool idle time at LPAR boot	Shrikanth Hegde
	When there are no options specified for lparstat, it is expected to give reports since LPAR(Logical Partition) boot. APP(Available Processor Pool) is an indicator of how many cores in the shared pool are free to use in Shared Processor LPAR(SPLPAR). APP is derived using pool_idle_time which is obtained using H_PIC call. The interval based reports show correct APP value while since boot report shows very high APP values. This happens because in that case APP is obtained by dividing pool idle time by LPAR uptime. Since pool idle time is reported by the PowerVM hypervisor since its boot, it need not align with LPAR boot. To fix that export boot pool idle time in lparcfg and powerpc-utils will use this info to derive APP as below for since boot reports. APP = (pool idle time - boot pool idle time) / (uptime * timebase) Results:: Observe APP values. ====================== Shared LPAR ================================ lparstat System Configuration type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00 reboot stress-ng --cpu=$(nproc) -t 600 sleep 600 So in this case app is expected to close to 37-6=31. ====== 6.9-rc1 and lparstat 1.3.10 ============= %user %sys %wait %idle physc %entc lbusy app vcsw phint ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21 === With this patch and powerpc-utils patch to do the above equation === %user %sys %wait %idle physc %entc lbusy app vcsw phint ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21 ===================================================================== Note: physc, purr/idle purr being inaccurate is being handled in a separate patch in powerpc-utils tree. Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240412092047.455483-2-sshegde@linux.ibm.com
2024-04-19	powerpc/mm: Update the memory limit based on direct mapping restrictions	Aneesh Kumar K.V (IBM)
	memory limit value specified by the user are further updated such that the value is 16MB aligned. This is because hash translation mode use 16MB as direct mapping page size. Make sure we update the global variable 'memory_limit' with the 16MB aligned value such that all kernel components will see the new aligned value of the memory limit. Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240403083611.172833-3-aneesh.kumar@kernel.org
2024-04-19	powerpc/fadump: Don't update the user-specified memory limit	Aneesh Kumar K.V (IBM)
	If the user specifies the memory limit, the kernel should honor it such that all allocation and reservations are made within the memory limit specified. fadump was breaking that rule. Remove the code which updates the memory limit such that fadump reservations are done within the limit specified. Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240403083611.172833-2-aneesh.kumar@kernel.org
2024-04-19	powerpc/mm: Align memory_limit value specified using mem= kernel parameter	Aneesh Kumar K.V (IBM)
	The value specified for the memory limit is used to set a restriction on memory usage. It is important to ensure that this restriction is within the linear map kernel address space range. The hash page table translation uses a 16MB page size to map the kernel linear map address space. htab_bolt_mapping() function aligns down the size of the range while mapping kernel linear address space. Since the memblock limit is enforced very early during boot, before we can detect the type of memory translation (radix vs hash), we align the memory limit value specified as a kernel parameter to 16MB. This alignment value will work for both hash and radix translations. Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org> Acked-by: Joel Savitz <jsavitz@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240403083611.172833-1-aneesh.kumar@kernel.org
2024-04-18	powerpc/ptdump: Fix walk_vmemmap() to also print first vmemmap entry	Ritesh Harjani (IBM)
	Currently walk_vmemmap() skips the first vmemmap entry pointed to by vmemmap_list pointer itself. Fix that. With the fix applied the vmemmap entry at 0xc00c000000000000 for hash is displayed: $ cat /sys/kernel/debug/kernel_hash_pagetable ... 0xc00c000000010000: AVPN:cd7bd4e0000 ssize: 1T ... 0xc00c000000000000: AVPN:cd7bd4e0000 ssize: 1T ... Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> [mpe: Tweak change log wording and add example output] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/a19ee3dc2b304d39da364a592d5cd167449f8c4a.1713365940.git.ritesh.list@gmail.com
2024-04-15	powerpc: Avoid nmi_enter/nmi_exit in real mode interrupt.	Mahesh Salgaonkar
	nmi_enter()/nmi_exit() touches per cpu variables which can lead to kernel crash when invoked during real mode interrupt handling (e.g. early HMI/MCE interrupt handler) if percpu allocation comes from vmalloc area. Early HMI/MCE handlers are called through DEFINE_INTERRUPT_HANDLER_NMI() wrapper which invokes nmi_enter/nmi_exit calls. We don't see any issue when percpu allocation is from the embedded first chunk. However with CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK enabled there are chances where percpu allocation can come from the vmalloc area. With kernel command line "percpu_alloc=page" we can force percpu allocation to come from vmalloc area and can see kernel crash in machine_check_early: [ 1.215714] NIP [c000000000e49eb4] rcu_nmi_enter+0x24/0x110 [ 1.215717] LR [c0000000000461a0] machine_check_early+0xf0/0x2c0 [ 1.215719] --- interrupt: 200 [ 1.215720] [c000000fffd73180] [0000000000000000] 0x0 (unreliable) [ 1.215722] [c000000fffd731b0] [0000000000000000] 0x0 [ 1.215724] [c000000fffd73210] [c000000000008364] machine_check_early_common+0x134/0x1f8 Fix this by avoiding use of nmi_enter()/nmi_exit() in real mode if percpu first chunk is not embedded. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Shirisha Ganta <shirisha@linux.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240410043006.81577-1-mahesh@linux.ibm.com
2024-04-15	powerpc: Add static_key_feature_checks_initialized flag	Nicholas Miehlbradt
	JUMP_LABEL_FEATURE_CHECK_DEBUG used static_key_intialized to determine whether {cpu,mmu}_has_feature() is used before static keys were initialized. However, {cpu,mmu}_has_feature() should not be used before setup_feature_keys() is called but static_key_initialized is set well before this by the call to jump_label_init() in early_init_devtree(). This creates a window in which JUMP_LABEL_FEATURE_CHECK_DEBUG will not detect misuse and report errors. Add a flag specifically to indicate when {cpu,mmu}_has_feature() is safe to use. Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240408052358.5030-1-nicholas@linux.ibm.com
2024-04-08	powerpc/52xx: Replace of_gpio.h by proper one	Andy Shevchenko
	of_gpio.h is deprecated and subject to remove. The driver doesn't use it directly, replace it with what is really being used. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240313135645.2066362-1-andriy.shevchenko@linux.intel.com
2024-04-08	powerpc: Fix fatal warnings flag for LLVM's integrated assembler	Nathan Chancellor
	When building with LLVM_IAS=1, there is an error because '-fatal-warnings' is not recognized as a valid flag: clang: error: unsupported argument '-fatal-warnings' to option '-Wa,' Use the double hyphen version of the flag, '--fatal-warnings', which works with both the GNU assembler and LLVM's integrated assembler. Fixes: 608d4a5ca563 ("powerpc: Error on assembly warnings") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240405-ppc-fix-wa-fatal-warnings-clang-v1-1-bdcd969f2ef0@kernel.org
2024-04-03	powerpc: Fix PS3 allmodconfig warning	Geoff Levand
	The struct ps3_notification_device in the ps3_probe_thread routine is too large to be on the stack, causing a warning for an allmodconfig build with clang. Change the struct ps3_notification_device from a variable on the stack to a dynamically allocated variable. Reported-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/d64f06f4-81ae-4ec5-ab3b-d7f7f091e0ac@infradead.org
2024-04-03	powerpc: Error on assembly warnings	Benjamin Gray
	We currently enable -Werror on the arch/powerpc subtree. However this only catches C warnings. Assembly warnings are logged, but the make invocation will still succeed. This can allow incorrect syntax such as ori r3, r4, r5 to be compiled without catching that the assembler is treating r5 as the immediate value 5. To prevent this in assembly files and inline assembly, add the -fatal-warnings option to assembler invocations. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Tested-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240326044420.577031-1-bgray@linux.ibm.com
2024-04-03	powerpc/fsl-soc: hide unused const variable	Arnd Bergmann
	vmpic_msi_feature is only used conditionally, which triggers a rare -Werror=unused-const-variable= warning with gcc: arch/powerpc/sysdev/fsl_msi.c:567:37: error: 'vmpic_msi_feature' defined but not used [-Werror=unused-const-variable=] 567 \| static const struct fsl_msi_feature vmpic_msi_feature = Hide this one in the same #ifdef as the reference so we can turn on the warning by default. Fixes: 305bcf26128e ("powerpc/fsl-soc: use CONFIG_EPAPR_PARAVIRT for hcalls") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240403080702.3509288-2-arnd@kernel.org
2024-04-03	powerpc: Use str_plural() in cpu_init_thread_core_maps()	Thorsten Blum
	Fixes the following Coccinelle/coccicheck warning reported by string_choices.cocci: opportunity for str_plural(tpc) Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240331222249.107467-2-thorsten.blum@toblux.com
2024-03-31	Linux 6.9-rc2v6.9-rc2	Linus Torvalds

2024-03-31	Merge tag 'kbuild-fixes-v6.9' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Deduplicate Kconfig entries for CONFIG_CXL_PMU - Fix unselectable choice entry in MIPS Kconfig, and forbid this structure - Remove unused include/asm-generic/export.h - Fix a NULL pointer dereference bug in modpost - Enable -Woverride-init warning consistently with W=1 - Drop KCSAN flags from .mod.c files tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: Fix typo HEIGTH to HEIGHT Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries kbuild: make -Woverride-init warnings more consistent modpost: do not make find_tosym() return NULL export.h: remove include/asm-generic/export.h kconfig: do not reparent the menu inside a choice block MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
2024-03-31	Merge tag 'edac_urgent_for_v6.9_rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - Fix more issues in the AMD FMPM driver * tag 'edac_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: RAS: Avoid build errors when CONFIG_DEBUG_FS=n RAS/AMD/FMPM: Safely handle saved records of various sizes RAS/AMD/FMPM: Avoid NULL ptr deref in get_saved_records()
2024-03-31	Merge tag 'irq_urgent_for_v6.9_rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix an unused function warning on irqchip/irq-armada-370-xp - Fix the IRQ sharing with pinctrl-amd and ACPI OSL * tag 'irq_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-370-xp: Suppress unused-function warning genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd
2024-03-31	Merge tag 'perf_urgent_for_v6.9_rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Borislav Petkov: - Define the correct set of default hw events on AMD Zen4 - Use the correct stalled cycles PMCs on AMD Zen2 and newer - Fix detection of the LBR freeze feature on AMD * tag 'perf_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and later perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later perf/x86/amd/lbr: Use freeze based on availability x86/cpufeatures: Add new word for scattered features
2024-03-31	Merge tag 'timers_urgent_for_v6.9_rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timers update from Borislav Petkov: - Volunteer in Anna-Maria and Frederic as timers co-maintainers so that tglx can relax more :-P * tag 'timers_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Add co-maintainers for time[rs]
2024-03-31	Merge tag 'objtool_urgent_for_v6.9_rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Borislav Petkov: - Fix a format specifier build error in objtool during an x32 build * tag 'objtool_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix compile failure when using the x32 compiler
2024-03-31	Merge tag 'x86_urgent_for_v6.9_rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure single object builds in arch/x86/virt/ ala make ... arch/x86/virt/vmx/tdx/seamcall.o work again - Do not do ROM range scans and memory validation when the kernel is running as a SEV-SNP guest as those can get problematic and, before that, are not really needed in such a guest - Exclude the build-time generated vdso-image-x32.o object from objtool validation and in particular the return sites in there due to a warning which fires when an unpatched return thunk is being used - Improve the NMI CPUs stall message to show additional information about the state of each CPU wrt the NMI handler - Enable gcc named address spaces support only on !KCSAN configs due to compiler options incompatibility - Revert a change which was trying to use GB pages for mapping regions only when the regions would be large enough but that change lead to kexec failing - A documentation fixlet * tag 'x86_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Use obj-y to descend into arch/x86/virt/ x86/sev: Skip ROM range scans and validation for SEV-SNP guests x86/vdso: Fix rethunk patching for vdso-image-x32.o too x86/nmi: Upgrade NMI backtrace stall checks & messages x86/percpu: Disable named address spaces for KCSAN Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped." Documentation/x86: Fix title underline length