summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-07-20powerpc/pseries: hvcall.h: add H_WATCHDOG opcode, H_NOOP return codeScott Cheloha
PAPR v2.12 defines a new hypercall, H_WATCHDOG. The hypercall permits guest control of one or more virtual watchdog timers. Add the opcode for the H_WATCHDOG hypercall to hvcall.h. While here, add a definition for H_NOOP, a possible return code for H_WATCHDOG. Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220713202335.1217647-2-cheloha@linux.ibm.com
2022-07-18powerpc/52xx: Mark gpt driver as not removableUwe Kleine-König
Returning an error code (here -EBUSY) from a remove callback doesn't prevent the driver from being unloaded. The only effect is that an error message is emitted and the driver is removed anyhow. So instead drop the remove function (which is equivalent to returning zero) and set the suppress_bind_attrs property to make it impossible to unload the driver via sysfs. This is a preparation for making platform remove callbacks return void. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220612213400.159257-1-u.kleine-koenig@pengutronix.de
2022-07-18docs: ABI: sysfs-bus-event_source-devices: Document sysfs caps entry for PMUAthira Rajeev
Details is added about "caps" attribute group in the ABI documentation. This is used to expose some of the PMU attributes in "caps" directory under : /sys/bus/event_source/devices/<dev>/. The dev/caps will contain information about features that platform specific PMU supports. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220520084630.15181-2-atrajeev@linux.vnet.ibm.com
2022-07-18powerpc/perf: Add support for caps under sysfs in powerpcAthira Rajeev
Add caps support under "/sys/bus/event_source/devices/<pmu>/" for powerpc. This directory can be used to expose some of the specific features that powerpc PMU supports to the user. Example: pmu_name. The name of PMU registered will depend on platform, say power9 or power10 or it could be Generic Compat PMU. Currently the only way to know which is the registered PMU is from the dmesg logs. But clearing the dmesg will make it difficult to know exact PMU backend used. And even extracting from dmesg will be complicated, as we need to parse the dmesg logs and add filters for pmu name. Whereas by exposing it via caps will make it easy as we just need to directly read it from the sysfs. Add a caps directory to /sys/bus/event_source/devices/cpu/ for power8, power9, power10 and generic compat PMU in respective PMU driver code. Update the pmu_name file under caps folder in core-book3s using "attr_update". The information exposed currently: - pmu_name : Underlying PMU name from the driver Example result with power9 pmu: # ls /sys/bus/event_source/devices/cpu/caps pmu_name # cat /sys/bus/event_source/devices/cpu/caps/pmu_name POWER9 Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220520084630.15181-1-atrajeev@linux.vnet.ibm.com
2022-07-18powerpc/perf: Give generic PMU a nice nameJoel Stanley
When booting on a machine that uses the compat pmu driver we see this: [ 0.071192] GENERIC_COMPAT performance monitor hardware support registered Which is a bit shouty. Give it a nicer name. Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220610044006.2095806-1-joel@jms.id.au
2022-07-09Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge KVM related commits we are keeping in a topic branch in case of any conflicts with generic KVM changes.
2022-07-09Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch. In particular this brings in commit 986481618023 ("powerpc/book3e: Fix PUD allocation size in map_kernel_page()") which fixes a build failure in next, because commit 2db2008e6363 ("powerpc/64e: Rewrite p4d_populate() as a static inline function") depends on it.
2022-07-04powerpc/powernv: delay rng platform device creation until later in bootJason A. Donenfeld
The platform device for the rng must be created much later in boot. Otherwise it tries to connect to a parent that doesn't yet exist, resulting in this splat: [ 0.000478] kobject: '(null)' ((____ptrval____)): is not initialized, yet kobject_get() is being called. [ 0.002925] [c000000002a0fb30] [c00000000073b0bc] kobject_get+0x8c/0x100 (unreliable) [ 0.003071] [c000000002a0fba0] [c00000000087e464] device_add+0xf4/0xb00 [ 0.003194] [c000000002a0fc80] [c000000000a7f6e4] of_device_add+0x64/0x80 [ 0.003321] [c000000002a0fcb0] [c000000000a800d0] of_platform_device_create_pdata+0xd0/0x1b0 [ 0.003476] [c000000002a0fd00] [c00000000201fa44] pnv_get_random_long_early+0x240/0x2e4 [ 0.003623] [c000000002a0fe20] [c000000002060c38] random_init+0xc0/0x214 This patch fixes the issue by doing the platform device creation inside of machine_subsys_initcall. Fixes: f3eac426657d ("powerpc/powernv: wire up rng during setup_arch") Cc: stable@vger.kernel.org Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> [mpe: Change "of node" to "platform device" in change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220630121654.1939181-1-Jason@zx2c4.com
2022-06-29powerpc/memhotplug: Add add_pages override for PPCAneesh Kumar K.V
With commit ffa0b64e3be5 ("powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit") the kernel now validate the addr against high_memory value. This results in the below BUG_ON with dax pfns. [ 635.798741][T26531] kernel BUG at mm/page_alloc.c:5521! 1:mon> e cpu 0x1: Vector: 700 (Program Check) at [c000000007287630] pc: c00000000055ed48: free_pages.part.0+0x48/0x110 lr: c00000000053ca70: tlb_finish_mmu+0x80/0xd0 sp: c0000000072878d0 msr: 800000000282b033 current = 0xc00000000afabe00 paca = 0xc00000037ffff300 irqmask: 0x03 irq_happened: 0x05 pid = 26531, comm = 50-landscape-sy kernel BUG at :5521! Linux version 5.19.0-rc3-14659-g4ec05be7c2e1 (kvaneesh@ltc-boston8) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #625 SMP Thu Jun 23 00:35:43 CDT 2022 1:mon> t [link register ] c00000000053ca70 tlb_finish_mmu+0x80/0xd0 [c0000000072878d0] c00000000053ca54 tlb_finish_mmu+0x64/0xd0 (unreliable) [c000000007287900] c000000000539424 exit_mmap+0xe4/0x2a0 [c0000000072879e0] c00000000019fc1c mmput+0xcc/0x210 [c000000007287a20] c000000000629230 begin_new_exec+0x5e0/0xf40 [c000000007287ae0] c00000000070b3cc load_elf_binary+0x3ac/0x1e00 [c000000007287c10] c000000000627af0 bprm_execve+0x3b0/0xaf0 [c000000007287cd0] c000000000628414 do_execveat_common.isra.0+0x1e4/0x310 [c000000007287d80] c00000000062858c sys_execve+0x4c/0x60 [c000000007287db0] c00000000002c1b0 system_call_exception+0x160/0x2c0 [c000000007287e10] c00000000000c53c system_call_common+0xec/0x250 The fix is to make sure we update high_memory on memory hotplug. This is similar to what x86 does in commit 3072e413e305 ("mm/memory_hotplug: introduce add_pages") Fixes: ffa0b64e3be5 ("powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220629050925.31447-1-aneesh.kumar@linux.ibm.com
2022-06-29powerpc/bpf: Fix use of user_pt_regs in uapiNaveen N. Rao
Trying to build a .c file that includes <linux/bpf_perf_event.h>: $ cat test_bpf_headers.c #include <linux/bpf_perf_event.h> throws the below error: /usr/include/linux/bpf_perf_event.h:14:28: error: field ‘regs’ has incomplete type 14 | bpf_user_pt_regs_t regs; | ^~~~ This is because we typedef bpf_user_pt_regs_t to 'struct user_pt_regs' in arch/powerpc/include/uaps/asm/bpf_perf_event.h, but 'struct user_pt_regs' is not exposed to userspace. Powerpc has both pt_regs and user_pt_regs structures. However, unlike arm64 and s390, we expose user_pt_regs to userspace as just 'pt_regs'. As such, we should typedef bpf_user_pt_regs_t to 'struct pt_regs' for userspace. Within the kernel though, we want to typedef bpf_user_pt_regs_t to 'struct user_pt_regs'. Remove arch/powerpc/include/uapi/asm/bpf_perf_event.h so that the uapi/asm-generic version of the header is exposed to userspace. Introduce arch/powerpc/include/asm/bpf_perf_event.h so that we can typedef bpf_user_pt_regs_t to 'struct user_pt_regs' for use within the kernel. Note that this was not showing up with the bpf selftest build since tools/include/uapi/asm/bpf_perf_event.h didn't include the powerpc variant. Fixes: a6460b03f945ee ("powerpc/bpf: Fix broken uapi for BPF_PROG_TYPE_PERF_EVENT") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Use typical naming for header include guard] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220627191119.142867-1-naveen.n.rao@linux.vnet.ibm.com
2022-06-29powerpc: dts: Add DTS file for CZ.NIC Turris 1.x routersPali Rohár
CZ.NIC Turris 1.0 and 1.1 are open source routers, they have dual-core PowerPC Freescale P2020 CPU and are based on Freescale P2020RDB-PC-A board. Hardware design is fully open source, all firmware and hardware design files are available at Turris project website: https://docs.turris.cz/hw/turris-1x/turris-1x/ https://project.turris.cz/en/hardware.html Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220624085550.20570-1-pali@kernel.org
2022-06-29KVM: PPC: Kconfig: Fix indentationJuerg Haefliger
The convention for indentation seems to be a single tab. Help text is further indented by an additional two whitespaces. Fix the lines that violate these rules. Signed-off-by: Juerg Haefliger <juergh@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220520115431.147593-1-juergh@canonical.com
2022-06-29powerpc/powernv: Kconfig: Replace single quotesJuerg Haefliger
Replace single quotes with double quotes which seems to be the convention for strings. Signed-off-by: Juerg Haefliger <juergh@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220520115229.147368-1-juergh@canonical.com
2022-06-29powerpc: Kconfig.debug: Remove extra empty lineJuerg Haefliger
Remove a stray extra empty line. Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220526065737.86370-3-juerg.haefliger@canonical.com
2022-06-29powerpc: Kconfig: Replace tabs with whitespacesJuerg Haefliger
Replace tabs after keywords with whitespaces to be consistent. Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220526065737.86370-2-juerg.haefliger@canonical.com
2022-06-29powerpc/perf: Update MMCR2 to support event exclude_idleMadhavan Srinivasan
struct perf_event_attr supports exclude counting of idle task. This is sent to kernel via perf_event_attr.exclude_idle and in perf tool, user can use ":I" event modifier to enable this for specific event. Monitor Mode Control Register 2 (MMCR2) SPR has control bits for each PMCs to freeze counting based on the Control Register CTRL[RUN] state. CTRL[RUN] is not set when idle task is running. Patch adds a check for event attr.exclude_idle to set MMCR2[FCnWAIT] bit. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210429050208.266619-1-maddy@linux.ibm.com
2022-06-29powerpc/pseries/iommu: Print ibm,query-pe-dma-windows parametersAlexey Kardashevskiy
PowerVM has a stricter policy about allocating TCEs for LPARs and often there is not enough TCEs for 1:1 mapping, this adds the supported numbers into dev_info() to help analyzing bugreports. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220601040117.1467710-1-aik@ozlabs.ru
2022-06-29KVM: PPC: Do not warn when userspace asked for too big TCE tableAlexey Kardashevskiy
KVM manages emulated TCE tables for guest LIOBNs by a two level table which maps up to 128TiB with 16MB IOMMU pages (enabled in QEMU by default) and MAX_ORDER=11 (the kernel's default). Note that the last level of the table is allocated when actual TCE is updated. However these tables are created via ioctl() on kvmfd and the userspace can trigger WARN_ON_ONCE_GFP(order >= MAX_ORDER, gfp) in mm/page_alloc.c and flood dmesg. This adds __GFP_NOWARN. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220628080228.1508847-1-aik@ozlabs.ru
2022-06-29powerpc/bpf/32: Add instructions for atomic_[cmp]xchgHari Bathini
This adds two atomic opcodes BPF_XCHG and BPF_CMPXCHG on ppc32, both of which include the BPF_FETCH flag. The kernel's atomic_cmpxchg operation fundamentally has 3 operands, but we only have two register fields. Therefore the operand we compare against (the kernel's API calls it 'old') is hard-coded to be BPF_REG_R0. Also, kernel's atomic_cmpxchg returns the previous value at dst_reg + off. JIT the same for BPF too with return value put in BPF_REG_0. BPF_REG_R0 = atomic_cmpxchg(dst_reg + off, BPF_REG_R0, src_reg); Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> (ppc64le) Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220610155552.25892-6-hbathini@linux.ibm.com
2022-06-29powerpc/bpf/32: add support for BPF_ATOMIC bitwise operationsHari Bathini
Adding instructions for ppc32 for atomic_and atomic_or atomic_xor atomic_fetch_add atomic_fetch_and atomic_fetch_or atomic_fetch_xor Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> (ppc64le) Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220610155552.25892-5-hbathini@linux.ibm.com
2022-06-29powerpc/bpf/64: Add instructions for atomic_[cmp]xchgHari Bathini
This adds two atomic opcodes BPF_XCHG and BPF_CMPXCHG on ppc64, both of which include the BPF_FETCH flag. The kernel's atomic_cmpxchg operation fundamentally has 3 operands, but we only have two register fields. Therefore the operand we compare against (the kernel's API calls it 'old') is hard-coded to be BPF_REG_R0. Also, kernel's atomic_cmpxchg returns the previous value at dst_reg + off. JIT the same for BPF too with return value put in BPF_REG_0. BPF_REG_R0 = atomic_cmpxchg(dst_reg + off, BPF_REG_R0, src_reg); Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> (ppc64le) Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220610155552.25892-4-hbathini@linux.ibm.com
2022-06-29powerpc/bpf/64: add support for atomic fetch operationsHari Bathini
Adding instructions for ppc64 for atomic[64]_fetch_add atomic[64]_fetch_and atomic[64]_fetch_or atomic[64]_fetch_xor Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> (ppc64le) Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220610155552.25892-3-hbathini@linux.ibm.com
2022-06-29powerpc/bpf/64: add support for BPF_ATOMIC bitwise operationsHari Bathini
Adding instructions for ppc64 for atomic[64]_and atomic[64]_or atomic[64]_xor Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> (ppc64le) Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220610155552.25892-2-hbathini@linux.ibm.com
2022-06-29powerpc/64s: Don't read H_BLOCK_REMOVE characteristics in radix modeLaurent Dufour
There is no need to read the H_BLOCK_REMOVE characteristics when running in Radix mode because this hcall is never called. Furthermore since the commit 387e220a2e5e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU") define pseries_lpar_read_hblkrm_characteristics as un empty function if CONFIG_PPC_64S_HASH_MMU is not set, the #ifdef block can be removed. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220523164353.26441-1-ldufour@linux.ibm.com
2022-06-29powerpc/papr_scm: use dev_get_drvdataHaowen Bai
Eliminate direct accesses to the driver_data field. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1653988790-19999-1-git-send-email-baihaowen@meizu.com
2022-06-29powerpc/64: Drop ppc_inst_as_str()Michael Ellerman
The ppc_inst_as_str() macro tries to make printing variable length, aka "prefixed", instructions convenient. It mostly succeeds, but it does hide an on-stack buffer, which triggers stack protector. More problematically it doesn't compile at all with GCC 12, with -Wdangling-pointer, due to the fact that it returns the char buffer declared inside the macro: arch/powerpc/kernel/trace/ftrace.c: In function '__ftrace_modify_call': ./include/linux/printk.h:475:44: error: using a dangling pointer to '__str' [-Werror=dangling-pointer=] 475 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__) ... arch/powerpc/kernel/trace/ftrace.c:567:17: note: in expansion of macro 'pr_err' 567 | pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op)); | ^~~~~~ ./arch/powerpc/include/asm/inst.h:156:14: note: '__str' declared here 156 | char __str[PPC_INST_STR_LEN]; \ | ^~~~~ This could be fixed by having the caller declare the buffer, but in some places there'd need to be two buffers. In all cases where ppc_inst_as_str() is used the output is not really meant for user consumption, it's almost always indicative of a kernel bug. A simpler solution is to just print the value as an unsigned long. For normal instructions the output is identical. For prefixed instructions the value is printed as a single 64-bit quantity, whereas previously the low half was printed first. But that is good enough for debug output, especially as prefixed instructions will be rare in kernel code in practice. Old: c000000000111170 60420000 ori r2,r2,0 c000000000111174 04100001 e580fb00 .long 0xe580fb0004100001 New: c00000000010f90c 60420000 ori r2,r2,0 c00000000010f910 e580fb0004100001 .long 0xe580fb0004100001 Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Reported-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20220531065936.3674348-1-mpe@ellerman.id.au
2022-06-29selftests/powerpc: Add missing files to .gitignoresMichael Ellerman
These were missed when the respective tests were added, add them now. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220616070705.1941829-1-mpe@ellerman.id.au
2022-06-29KVM: PPC: Align pt_regs in kvm_vcpu_arch structureFabiano Rosas
The H_ENTER_NESTED hypercall receives as second parameter the address of a region of memory containing the values for the nested guest privileged registers. We currently use the pt_regs structure contained within kvm_vcpu_arch for that end. Most hypercalls that receive a memory address expect that region to not cross a 4K page boundary. We would want H_ENTER_NESTED to follow the same pattern so this patch ensures the pt_regs structure sits within a page. Note: the pt_regs structure is currently 384 bytes in size, so aligning to 512 is sufficient to ensure it will not cross a 4K page and avoids punching too big a hole in struct kvm_vcpu_arch. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Murilo Opsfelder Araújo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220624142712.790491-1-farosas@linux.ibm.com
2022-06-29KVM: PPC: Book3S HV: tracing: Add missing hcall namesFabiano Rosas
The kvm_trace_symbol_hcall macro is missing several of the hypercalls defined in hvcall.h. Add the most common ones that are issued during guest lifetime, including the ones that are only used by QEMU and SLOF. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220614165204.549229-1-farosas@linux.ibm.com
2022-06-29KVM: PPC: Book3S HV: Provide more detailed timings for P9 entry pathFabiano Rosas
Alter the data collection points for the debug timing code in the P9 path to be more in line with what the code does. The points where we accumulate time are now the following: vcpu_entry: From vcpu_run_hv entry until the start of the inner loop; guest_entry: From the start of the inner loop until the guest entry in asm; in_guest: From the guest entry in asm until the return to KVM C code; guest_exit: From the return into KVM C code until the corresponding hypercall/page fault handling or re-entry into the guest; hypercall: Time spent handling hcalls in the kernel (hcalls can go to QEMU, not accounted here); page_fault: Time spent handling page faults; vcpu_exit: vcpu_run_hv exit (almost no code here currently). Like before, these are exposed in debugfs in a file called "timings". There are four values: - number of occurrences of the accumulation point; - total time the vcpu spent in the phase in ns; - shortest time the vcpu spent in the phase in ns; - longest time the vcpu spent in the phase in ns; === Before: rm_entry: 53132 16793518 256 4060 rm_intr: 53132 2125914 22 340 rm_exit: 53132 24108344 374 2180 guest: 53132 40980507996 404 9997650 cede: 0 0 0 0 After: vcpu_entry: 34637 7716108 178 4416 guest_entry: 52414 49365608 324 747542 in_guest: 52411 40828715840 258 9997480 guest_exit: 52410 19681717182 826 102496674 vcpu_exit: 34636 1744462 38 182 hypercall: 45712 22878288 38 1307962 page_fault: 992 111104034 568 168688 With just one instruction (hcall): vcpu_entry: 1 942 942 942 guest_entry: 1 4044 4044 4044 in_guest: 1 1540 1540 1540 guest_exit: 1 3542 3542 3542 vcpu_exit: 1 80 80 80 hypercall: 0 0 0 0 page_fault: 0 0 0 0 === Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220525130554.2614394-6-farosas@linux.ibm.com
2022-06-29KVM: PPC: Book3S HV: Expose timing functions to module codeFabiano Rosas
The next patch adds new timing points to the P9 entry path, some of which are in the module code, so we need to export the timing functions. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220525130554.2614394-5-farosas@linux.ibm.com
2022-06-29KVM: PPC: Book3S HV: Decouple the debug timing from the P8 entry pathFabiano Rosas
We are currently doing the timing for debug purposes of the P9 entry path using the accumulators and terminology defined by the old entry path for P8 machines. Not only the "real-mode" and "napping" mentions are out of place for the P9 Radix entry path but also we cannot change them because the timing code is coupled to the structures defined in struct kvm_vcpu_arch. Add a new CONFIG_KVM_BOOK3S_HV_P9_TIMING to enable the timing code for the P9 entry path. For now, just add the new CONFIG and duplicate the structures. A subsequent patch will add the P9 changes. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220525130554.2614394-4-farosas@linux.ibm.com
2022-06-29KVM: PPC: Book3S HV: Add a new config for P8 debug timingFabiano Rosas
Turn the existing Kconfig KVM_BOOK3S_HV_EXIT_TIMING into KVM_BOOK3S_HV_P8_TIMING in preparation for the addition of a new config for P9 timings. This applies only to P8 code, the generic timing code is still kept under KVM_BOOK3S_HV_EXIT_TIMING. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220525130554.2614394-3-farosas@linux.ibm.com
2022-06-29KVM: PPC: Book3S HV: Fix "rm_exit" entry in debugfs timingsFabiano Rosas
At debugfs/kvm/<pid>/vcpu0/timings we show how long each part of the code takes to run: $ cat /sys/kernel/debug/kvm/*-*/vcpu0/timings rm_entry: 123785 49398892 118 4898 rm_intr: 123780 6075890 22 390 rm_exit: 0 0 0 0 <-- NOK guest: 123780 46732919988 402 9997638 cede: 0 0 0 0 <-- OK, no cede napping in P9 The "rm_exit" is always showing zero because it is the last one and end_timing does not increment the counter of the previous entry. We can fix it by calling accumulate_time again instead of end_timing. That way the counter gets incremented. The rest of the arithmetic can be ignored because there are no timing points after this and the accumulators are reset before the next round. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220525130554.2614394-2-farosas@linux.ibm.com
2022-06-29powerpc/64e: KASAN Full support for BOOK3E/64Christophe Leroy
We now have memory organised in a way that allows implementing KASAN. Unlike book3s/64, book3e always has translation active so the only thing needed to use KASAN is to setup an early zero shadow mapping just after setting a stack pointer and before calling early_setup(). The memory layout is now as follows +------------------------+ Kernel virtual map end (0xc000200000000000) | | | 16TB of KASAN map | | | +------------------------+ Kernel KASAN shadow map start | | | 16TB of IO map | | | +------------------------+ Kernel IO map start | | | 16TB of vmemmap | | | +------------------------+ Kernel vmemmap start | | | 16TB of vmap | | | +------------------------+ Kernel virt start (0xc000100000000000) | | | 64TB of linear mem | | | +------------------------+ Kernel linear (0xc.....) Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0bef8beda27baf71e3b9e8b13e620fba6e19499b.1656427701.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/64e: Reorganise virtual memoryChristophe Leroy
Reduce the size of IO map in order to leave the last quarter of virtual MAP for KASAN shadow mapping. This gives the following layout. +------------------------+ Kernel virtual map end (0xc000200000000000) | | | 16TB (unused) | | | +------------------------+ Kernel IO map end | | | 16TB of IO map | | | +------------------------+ Kernel IO map start | | | 16TB of vmemmap | | | +------------------------+ Kernel vmemmap start | | | 16TB of vmap | | | +------------------------+ Kernel virt start (0xc000100000000000) | | | 64TB of linear mem | | | +------------------------+ Kernel linear (0xc.....) Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/54ef01673bf14228106afd629f795c83acb9a00c.1656427701.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/64e: Move virtual memory closer to linear memoryChristophe Leroy
Today nohash/64 have linear memory based at 0xc000000000000000 and virtual memory based at 0x8000000000000000. In order to implement KASAN, we need to regroup both areas. Move virtual memmory at 0xc000100000000000. This complicates a bit TLB miss handlers. Until now, memory region was easily identified with the 4 higher bits of address: - 0 ==> User - c ==> Linear Memory - 8 ==> Virtual Memory Now we need to rely on the 20 higher bits, with: - 0xxxx ==> User - c0000 ==> Linear Memory - c0001 ==> Virtual Memory Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4b225168031449fc34fc7132f3923cc8dc54af60.1656427701.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/64e: Remove unused REGION related macrosChristophe Leroy
Those macros are not used anywhere. Remove them as they are soon going to be wrong and are not worth modifying as they are not used. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f0efde8cee0924c3991790042b176ac77ad35e1f.1656427701.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/64e: Remove MMU_FTR_USE_TLBRSRV and MMU_FTR_USE_PAIRED_MASChristophe Leroy
Commit fb5a515704d7 ("powerpc: Remove platforms/wsp and associated pieces") removed the last CPU having features MMU_FTRS_A2 and commit cd68098bcedd ("powerpc: Clean up MMU_FTRS_A2 and MMU_FTR_TYPE_3E") removed MMU_FTRS_A2 which was the last user of MMU_FTR_USE_TLBRSRV and MMU_FTR_USE_PAIRED_MAS. Remove all code that relies on MMU_FTR_USE_TLBRSRV and MMU_FTR_USE_PAIRED_MAS. With this change done, TLB miss can happen before the mmu feature fixups. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cfd5a0ecdb1598da968832e1bddf7431ec267200.1656427701.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/64e: Fix early TLB miss with KUAPChristophe Leroy
With KUAP, the TLB miss handler bails out when an access to user memory is performed with a nul TID. But the normal TLB miss routine which is only used early during boot does the check regardless for all memory areas, not only user memory. By chance there is no early IO or vmalloc access, but when KASAN come we will start having early TLB misses. Fix it by creating a special branch for user accesses similar to the one in the 'bolted' TLB miss handlers. Unfortunately SPRN_MAS1 is now read too early and there are no registers available to preserve it so it will be read a second time. Fixes: 57bc963837f5 ("powerpc/kuap: Wire-up KUAP on book3e/64") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8d6c5859a45935d6e1a336da4dc20be421e8cea7.1656427701.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/ptdump: Fix display of RW pages on FSL_BOOK3EChristophe Leroy
On FSL_BOOK3E, _PAGE_RW is defined with two bits, one for user and one for supervisor. As soon as one of the two bits is set, the page has to be display as RW. But the way it is implemented today requires both bits to be set in order to display it as RW. Instead of display RW when _PAGE_RW bits are set and R otherwise, reverse the logic and display R when _PAGE_RW bits are all 0 and RW otherwise. This change has no impact on other platforms as _PAGE_RW is a single bit on all of them. Fixes: 8eb07b187000 ("powerpc/mm: Dump linux pagetables") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0c33b96317811edf691e81698aaee8fa45ec3449.1656427391.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/64e: Rewrite p4d_populate() as a static inline functionChristophe Leroy
Rewrite p4d_populate() as a static inline function instead of a macro. This change allows typechecking and would have helped detecting a recently found bug in map_kernel_page(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1b416f8a8fe1bc3f4e01175680ce310b7eb3a1e4.1655974565.git.christophe.leroy@csgroup.eu
2022-06-29powerpc: Remove _PAGE_SAO stub for book3e/64Christophe Leroy
Since commit 634093c59a12 ("powerpc/mm: enable ARCH_HAS_VM_GET_PAGE_PROT"), _PAGE_SAO is used only in arch/powerpc/mm/book3s64/pgtable.c The _PAGE_SAO stub defined as 0 for book3e/64 can be removed. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/715e644fb3c7d992c0b71f6165ab6cf8c682055a.1655706069.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/32: Remove __map_without_ltlbsChristophe Leroy
__map_without_ltlbs is used only for 40x, and only when STRICT_KERNEL_RWX, KFENCE or DEBUG_PAGEALLOC is active. Do the verification directly in 40x version of mmu_mapin_ram() and remove __map_without_ltlbs from core ppc32. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3422094db965d218c4c3d8580f526963a9ac897f.1655202721.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/32: Remove 'noltlbs' kernel parameterChristophe Leroy
Mapping without large TLBs has no added value on the 8xx. Mapping without large TLBs is still necessary on 40x when selecting CONFIG_KFENCE or CONFIG_DEBUG_PAGEALLOC or CONFIG_STRICT_KERNEL_RWX, but this is done automatically and doesn't require user selection. Remove 'noltlbs' kernel parameter, the user has no reason to use it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/80ca17bd39cf608a8ebd0764d7064a498e131199.1655202721.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/32: Remove the 'nobats' kernel parameterChristophe Leroy
Mapping without BATs doesn't bring any added value to the user. Remove that option. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6977314c823cfb728bc0273cea634b41807bfb64.1655202721.git.christophe.leroy@csgroup.eu
2022-06-29powerpc: Restore CONFIG_DEBUG_INFO in defconfigsChristophe Leroy
Commit f9b3cd245784 ("Kconfig.debug: make DEBUG_INFO selectable from a choice") broke the selection of CONFIG_DEBUG_INFO by powerpc defconfigs. It is now necessary to select one of the three DEBUG_INFO_DWARF* options to get DEBUG_INFO enabled. Replace DEBUG_INFO=y by DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y in all defconfigs using the following command: sed -i s/DEBUG_INFO=y/DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y/g `git grep -l DEBUG_INFO arch/powerpc/configs/` Fixes: f9b3cd245784 ("Kconfig.debug: make DEBUG_INFO selectable from a choice") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/98a4c2603bf9e4b776e219f5b8541d23aa24e854.1654930308.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/irq: Simplify __do_irq()Christophe Leroy
Remove duplicated code by implementing a proper if/else. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5a3b21311191f1240850db6ab29b19ac7885fe03.1654769775.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/irq: Perform stack_overflow detection after switching to IRQ stackChristophe Leroy
When KASAN is enabled, as shown by the Oops below, the 2k limit is not enough to allow stack dump after a stack overflow detection when CONFIG_DEBUG_STACKOVERFLOW is selected: do_IRQ: stack overflow: 1984 CPU: 0 PID: 126 Comm: systemd-udevd Not tainted 5.18.0-gentoo-PMacG4 #1 Call Trace: Oops: Kernel stack overflow, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac Modules linked in: sr_mod cdrom radeon(+) ohci_pci(+) hwmon i2c_algo_bit drm_ttm_helper ttm drm_dp_helper snd_aoa_i2sbus snd_aoa_soundbus snd_pcm ehci_pci snd_timer ohci_hcd snd ssb ehci_hcd 8250_pci soundcore drm_kms_helper pcmcia 8250 pcmcia_core syscopyarea usbcore sysfillrect 8250_base sysimgblt serial_mctrl_gpio fb_sys_fops usb_common pkcs8_key_parser fuse drm drm_panel_orientation_quirks configfs CPU: 0 PID: 126 Comm: systemd-udevd Not tainted 5.18.0-gentoo-PMacG4 #1 NIP: c02e5558 LR: c07eb3bc CTR: c07f46a8 REGS: e7fe9f50 TRAP: 0000 Not tainted (5.18.0-gentoo-PMacG4) MSR: 00001032 <ME,IR,DR,RI> CR: 44a14824 XER: 20000000 GPR00: c07eb3bc eaa1c000 c26baea0 eaa1c0a0 00000008 00000000 c07eb3bc eaa1c010 GPR08: eaa1c0a8 04f3f3f3 f1f1f1f1 c07f4c84 44a14824 0080f7e4 00000005 00000010 GPR16: 00000025 eaa1c154 eaa1c158 c0dbad64 00000020 fd543810 eaa1c0a0 eaa1c29e GPR24: c0dbad44 c0db8740 05ffffff fd543802 eaa1c150 c0c9a3c0 eaa1c0a0 c0c9a3c0 NIP [c02e5558] kasan_check_range+0xc/0x2b4 LR [c07eb3bc] format_decode+0x80/0x604 Call Trace: [eaa1c000] [c07eb3bc] format_decode+0x80/0x604 (unreliable) [eaa1c070] [c07f4dac] vsnprintf+0x128/0x938 [eaa1c110] [c07f5788] sprintf+0xa0/0xc0 [eaa1c180] [c0154c1c] __sprint_symbol.constprop.0+0x170/0x198 [eaa1c230] [c07ee71c] symbol_string+0xf8/0x260 [eaa1c430] [c07f46d0] pointer+0x15c/0x710 [eaa1c4b0] [c07f4fbc] vsnprintf+0x338/0x938 [eaa1c550] [c00e8fa0] vprintk_store+0x2a8/0x678 [eaa1c690] [c00e94e4] vprintk_emit+0x174/0x378 [eaa1c6d0] [c00ea008] _printk+0x9c/0xc0 [eaa1c750] [c000ca94] show_stack+0x21c/0x260 [eaa1c7a0] [c07d0bd4] dump_stack_lvl+0x60/0x90 [eaa1c7c0] [c0009234] __do_IRQ+0x170/0x174 [eaa1c800] [c0009258] do_IRQ+0x20/0x34 [eaa1c820] [c00045b4] HardwareInterrupt_virt+0x108/0x10c ... As the detection is asynchronously performed at IRQs, we could be long after the limit has been crossed, so increasing the limit would not solve the problem completely. In order to be sure that there is enough stack space for the stack dump, do it after the switch to the IRQ stack. That way it is sure that the stack is large enough, unless the IRQ stack has been overfilled in which case the end of life is close. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c215d714329f475b431a6193369035aadfc0d182.1654769775.git.christophe.leroy@csgroup.eu
2022-06-29powerpc/irq: Make __do_irq() staticChristophe Leroy
Since commit 48cf12d88969 ("powerpc/irq: Inline call_do_irq() and call_do_softirq()"), __do_irq() is not used outside irq.c Reorder functions and make __do_irq() static and drop the declaration in irq.h. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/adbe1c8315ec2d63259f41468e82e51677bb1eda.1654769775.git.christophe.leroy@csgroup.eu