summaryrefslogtreecommitdiff
path: root/arch/powerpc/perf/core-book3s.c
AgeCommit message (Collapse)Author
2020-12-04powerpc/perf: MMCR0 control for PMU registers under PMCC=00Athira Rajeev
PowerISA v3.1 introduces new control bit (PMCCEXT) for restricting access to group B PMU registers in problem state when MMCR0 PMCC=0b00. In problem state and when MMCR0 PMCC=0b00, setting the Monitor Mode Control Register bit 54 (MMCR0 PMCCEXT), will restrict read permission on Group B Performance Monitor Registers (SIER, SIAR, SDAR and MMCR1). When this bit is set to zero, group B registers will be readable. In other platforms (like power9), the older behaviour is retained where group B PMU SPRs are readable. Patch adds support for MMCR0 PMCCEXT bit in power10 by enabling this bit during boot and during the PMU event enable/disable callback functions. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606409684-1589-8-git-send-email-atrajeev@linux.vnet.ibm.com
2020-12-04powerpc/perf: Invoke per-CPU variable access with disabled interruptsAthira Rajeev
The power_pmu_event_init() callback access per-cpu variable (cpu_hw_events) to check for event constraints and Branch Stack (BHRB). Current usage is to disable preemption when accessing the per-cpu variable, but this does not prevent timer callback from interrupting event_init. Fix this by using 'local_irq_save/restore' to make sure the code path is invoked with disabled interrupts. This change is tested in mambo simulator to ensure that, if a timer interrupt comes in during the per-cpu access in event_init, it will be soft masked and replayed later. For testing purpose, introduced a udelay() in power_pmu_event_init() to make sure a timer interrupt arrives while in per-cpu variable access code between local_irq_save/resore. As expected the timer interrupt was replayed later during local_irq_restore called from power_pmu_event_init. This was confirmed by adding breakpoint in mambo and checking the backtrace when timer_interrupt was hit. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606814880-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-12-04powerpc/perf: Fix crash with is_sier_available when pmu is not setAthira Rajeev
On systems without any specific PMU driver support registered, running 'perf record' with —intr-regs will crash ( perf record -I <workload> ). The relevant portion from crash logs and Call Trace: Unable to handle kernel paging request for data at address 0x00000068 Faulting instruction address: 0xc00000000013eb18 Oops: Kernel access of bad area, sig: 11 [#1] CPU: 2 PID: 13435 Comm: kill Kdump: loaded Not tainted 4.18.0-193.el8.ppc64le #1 NIP: c00000000013eb18 LR: c000000000139f2c CTR: c000000000393d80 REGS: c0000004a07ab4f0 TRAP: 0300 Not tainted (4.18.0-193.el8.ppc64le) NIP [c00000000013eb18] is_sier_available+0x18/0x30 LR [c000000000139f2c] perf_reg_value+0x6c/0xb0 Call Trace: [c0000004a07ab770] [c0000004a07ab7c8] 0xc0000004a07ab7c8 (unreliable) [c0000004a07ab7a0] [c0000000003aa77c] perf_output_sample+0x60c/0xac0 [c0000004a07ab840] [c0000000003ab3f0] perf_event_output_forward+0x70/0xb0 [c0000004a07ab8c0] [c00000000039e208] __perf_event_overflow+0x88/0x1a0 [c0000004a07ab910] [c00000000039e42c] perf_swevent_hrtimer+0x10c/0x1d0 [c0000004a07abc50] [c000000000228b9c] __hrtimer_run_queues+0x17c/0x480 [c0000004a07abcf0] [c00000000022aaf4] hrtimer_interrupt+0x144/0x520 [c0000004a07abdd0] [c00000000002a864] timer_interrupt+0x104/0x2f0 [c0000004a07abe30] [c0000000000091c4] decrementer_common+0x114/0x120 When perf record session is started with "-I" option, capturing registers on each sample calls is_sier_available() to check for the SIER (Sample Instruction Event Register) availability in the platform. This function in core-book3s accesses 'ppmu->flags'. If a platform specific PMU driver is not registered, ppmu is set to NULL and accessing its members results in a crash. Fix the crash by returning false in is_sier_available() if ppmu is not set. Fixes: 333804dc3b7a ("powerpc/perf: Update perf_regs structure to include SIER") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606185640-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-11-26Merge remote-tracking branch 'origin/master' into perf/corePeter Zijlstra
Further perf/core patches will depend on: d3f7b1bb2040 ("mm/gup: fix gup_fast with dynamic page table folding") which is already in Linus' tree.
2020-11-19powerpc/perf: Use regs->nip when SIAR is zeroMadhavan Srinivasan
In power10 DD1, there is an issue where the SIAR (Sampled Instruction Address Register) is not latching to the sampled address during random sampling. This results in value of 0s in the SIAR. Add a check to use regs->nip when SIAR is zero. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-5-maddy@linux.ibm.com
2020-11-19powerpc/perf: Use the address from SIAR register to set cpumode flagsAthira Rajeev
While setting the processor mode for any sample, perf_get_misc_flags() expects the privilege level to differentiate the userspace and kernel address. On power10 DD1, there is an issue that causes MSR_HV MSR_PR bits of Sampled Instruction Event Register (SIER) not to be set for marked events. Hence add a check to use the address in SIAR (Sampled Instruction Address Register) to identify the privilege level. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-3-maddy@linux.ibm.com
2020-11-19powerpc/perf: Drop the check for SIAR_VALIDAthira Rajeev
In power10 DD1, there is an issue that causes the SIAR_VALID bit of the SIER (Sampled Instruction Event Register) to not be set. But the SIAR_VALID bit is used for fetching the instruction address from the SIAR (Sampled Instruction Address Register), and marked events are sampled only if the SIAR_VALID bit is set. So drop the check for SIAR_VALID and return true always incase of power10 DD1. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-2-maddy@linux.ibm.com
2020-10-29powerpc/perf: Support PERF_SAMPLE_DATA_PAGE_SIZEKan Liang
The new sample type, PERF_SAMPLE_DATA_PAGE_SIZE, requires the virtual address. Update the data->addr if the sample type is set. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201001135749.2804-4-kan.liang@linux.intel.com
2020-08-27powerpc/perf: Fix crashes with generic_compat_pmu & BHRBAlexey Kardashevskiy
The bhrb_filter_map ("The Branch History Rolling Buffer") callback is only defined in raw CPUs' power_pmu structs. The "architected" CPUs use generic_compat_pmu, which does not have this callback, and crashes occur if a user tries to enable branch stack for an event. This add a NULL pointer check for bhrb_filter_map() which behaves as if the callback returned an error. This does not add the same check for config_bhrb() as the only caller checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0. Fixes: be80e758d0c2 ("powerpc/perf: Add generic compat mode pmu driver") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200602025612.62707-1-aik@ozlabs.ru
2020-08-20powerpc/perf: Fix soft lockups due to missed interrupt accountingAthira Rajeev
Performance monitor interrupt handler checks if any counter has overflown and calls record_and_restart() in core-book3s which invokes perf_event_overflow() to record the sample information. Apart from creating sample, perf_event_overflow() also does the interrupt and period checks via perf_event_account_interrupt(). Currently we record information only if the SIAR (Sampled Instruction Address Register) valid bit is set (using siar_valid() check) and hence the interrupt check. But it is possible that we do sampling for some events that are not generating valid SIAR, and hence there is no chance to disable the event if interrupts are more than max_samples_per_tick. This leads to soft lockup. Fix this by adding perf_event_account_interrupt() in the invalid SIAR code path for a sampling event. ie if SIAR is invalid, just do interrupt check and don't record the sample information. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596717992-7321-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-08-17powerpc/perf: Add support for outputting extended regs in perf intr_regsAnju T Sudhakar
Add support for perf extended register capability in powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the PMU which support extended registers. The generic code define the mask of extended registers as 0 for non supported architectures. Patch adds extended regs support for power9 platform by exposing MMCR0, MMCR1 and MMCR2 registers. REG_RESERVED mask needs update to include extended regs. PERF_REG_EXTENDED_MASK, contains mask value of the supported registers, is defined at runtime in the kernel based on platform since the supported registers may differ from one processor version to another and hence the MASK value. With the patch: available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2 PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0 ... intr regs: mask 0xffffffffffff ABI 64-bit .... r0 0xc00000000012b77c .... r1 0xc000003fe5e03930 .... r2 0xc000000001b0e000 .... r3 0xc000003fdcddf800 .... r4 0xc000003fc7880000 .... r5 0x9c422724be .... r6 0xc000003fe5e03908 .... r7 0xffffff63bddc8706 .... r8 0x9e4 .... r9 0x0 .... r10 0x1 .... r11 0x0 .... r12 0xc0000000001299c0 .... r13 0xc000003ffffc4800 .... r14 0x0 .... r15 0x7fffdd8b8b00 .... r16 0x0 .... r17 0x7fffdd8be6b8 .... r18 0x7e7076607730 .... r19 0x2f .... r20 0xc00000001fc26c68 .... r21 0xc0002041e4227e00 .... r22 0xc00000002018fb60 .... r23 0x1 .... r24 0xc000003ffec4d900 .... r25 0x80000000 .... r26 0x0 .... r27 0x1 .... r28 0x1 .... r29 0xc000000001be1260 .... r30 0x6008010 .... r31 0xc000003ffebb7218 .... nip 0xc00000000012b910 .... msr 0x9000000000009033 .... orig_r3 0xc00000000012b86c .... ctr 0xc0000000001299c0 .... link 0xc00000000012b77c .... xer 0x0 .... ccr 0x28002222 .... softe 0x1 .... trap 0xf00 .... dar 0x0 .... dsisr 0x80000000000 .... sier 0x0 .... mmcra 0x80000000000 .... mmcr0 0x82008090 .... mmcr1 0x1e000000 .... mmcr2 0x0 ... thread: perf:4784 Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Nageswara R Sastry <nasastry@in.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596794701-23530-2-git-send-email-atrajeev@linux.vnet.ibm.com
2020-08-07Merge tag 'powerpc-5.9-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for (optionally) using queued spinlocks & rwlocks. - Support for a new faster system call ABI using the scv instruction on Power9 or later. - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on Power10 and future processors, leaving us with no way to implement the functionality it requests. This risks breaking userspace, though we believe it is unused in practice. - A bug fix for, and then the removal of, our custom stack expansion checking. We now allow stack expansion up to the rlimit, like other architectures. - Remove the remnants of our (previously disabled) topology update code, which tried to react to NUMA layout changes on virtualised systems, but was prone to crashes and other problems. - Add PMU support for Power10 CPUs. - A change to our signal trampoline so that we don't unbalance the link stack (branch return predictor) in the signal delivery path. - Lots of other cleanups, refactorings, smaller features and so on as usual. Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong, YueHaibing. * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits) selftests/powerpc: Fix pkey syscall redefinitions powerpc: Fix circular dependency between percpu.h and mmu.h powerpc/powernv/sriov: Fix use of uninitialised variable selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs powerpc/40x: Fix assembler warning about r0 powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric powerpc/papr_scm: Fetch nvdimm performance stats from PHYP cpuidle: pseries: Fixup exit latency for CEDE(0) cpuidle: pseries: Add function to parse extended CEDE records cpuidle: pseries: Set the latency-hint before entering CEDE selftests/powerpc: Fix online CPU selection powerpc/perf: Consolidate perf_callchain_user_[64|32]() powerpc/pseries/hotplug-cpu: Remove double free in error path powerpc/pseries/mobility: Add pr_debug() for device tree changes powerpc/pseries/mobility: Set pr_fmt() powerpc/cacheinfo: Warn if cache object chain becomes unordered powerpc/cacheinfo: Improve diagnostics about malformed cache lists powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages powerpc/cacheinfo: Set pr_fmt() powerpc: fix function annotations to avoid section mismatch warnings with gcc-10 ...
2020-07-27powerpc/64s/hash: Fix hash_preload running with interrupts enabledNicholas Piggin
Commit 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") removed the local_irq_disable from hash_preload, but it was required for more than just the page table walk: the hash pte busy bit is effectively a lock which may be taken in interrupt context, and the local update flag test must not be preempted before it's used. This solves apparent lockups with perf interrupting __hash_page_64K. If get_perf_callchain then also takes a hash fault on the same page while it is already locked, it will loop forever taking hash faults, which looks like this: cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70] pc: c000000000072dc8: hash_page_mm+0x8/0x800 lr: c00000000000c5a4: do_hash_page+0x24/0x38 sp: c0002ac1cc69ac70 msr: 8000000000081033 current = 0xc0002ac1cc602e00 paca = 0xc00000001de1f280 irqmask: 0x03 irq_happened: 0x01 pid = 20118, comm = pread2_processe Linux version 5.8.0-rc6-00345-g1fad14f18bc6 49e:mon> t [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable) --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac [link register ] c000000000335d10 copy_from_user_nofault+0xf0/0x150 [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable) [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0 [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410 [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40 [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360 [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0 [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790 [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0 [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0 [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750 [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510 [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60 [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0 --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160 [link register ] c0000000000846f0 __hash_page_64K+0x210/0x540 [c0002ac1cc69ba50] 0000000000000000 (unreliable) [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0 [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0 [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60 [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60 [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90 [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c --- Exception: 300 (Data Access) at 00007fff8c861fe8 SP (7ffff6b19660) is in userspace Fixes: 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reported-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727060947.10060-1-npiggin@gmail.com
2020-07-22powerpc/perf: BHRB control to disable BHRB logic when not usedAthira Rajeev
PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB). BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This field controls whether BHRB entries are written when BHRB recording is enabled by other bits. This patch implements support for this BHRB disable bit. By setting 0b1 to this bit will disable the BHRB and by setting 0b0 to this bit will have BHRB enabled. This addresses backward compatibility (for older OS), since this bit will be cleared and hardware will be writing to BHRB by default. This patch addresses changes to set MMCRA (BHRBRD) at boot for power10 (there by the core will run faster) and enable this feature only on runtime ie, on explicit need from user. Also save/restore MMCRA in the restore path of state-loss idle state to make sure we keep BHRB disabled if it was not enabled on request at runtime. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-12-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Ignore the BHRB kernel address filtering for P10Athira Rajeev
Commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer") added a check in bhrb_read() to filter the kernel address from BHRB buffer. This patch modified it to avoid that check for PowerISA v3.1 based processors, since PowerISA v3.1 allows only MSR[PR]=1 address to be written to BHRB buffer. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-10-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: power10 Performance Monitoring supportAthira Rajeev
Base enablement patch to register performance monitoring hardware support for power10. Patch introduce the raw event encoding format, defines the supported list of events, config fields for the event attributes and their corresponding bit values which are exported via sysfs. Patch also enhances the support function in isa207_common.c to include power10 pmu hardware. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-9-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Add Power10 PMU feature to DT CPU featuresMadhavan Srinivasan
Add Power10 feature function to DT CPU features, along with a Power10 specific init() to initialize PMU SPRs, sets the oprofile_cpu_type and cpu_features. This will enable performance monitoring unit (PMU) for Power10 in CPU features with "performance-monitor-power10". For Power ISA v3.1, BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB feature at boot for Power10. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Move MMCRA_BHRB_DISABLE as noted by jpn, drop CPU setup changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-8-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Add support for ISA3.1 PMU SPRsMadhavan Srinivasan
PowerISA v3.1 includes new performance monitoring unit(PMU) special purpose registers (SPRs). They are Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register 2 (SIER2) Sampled Instruction Event Register 3 (SIER3) MMCR3 is added for further sampling related configuration control. SIER2/SIER3 are added to provide additional information about the sampled instruction. Patch adds new PPMU flag called "PPMU_ARCH_31" to support handling of these new SPRs, updates the struct thread_struct to include these new SPRs, include MMCR3 in struct mmcr_regs. This is needed to support programming of MMCR3 SPR during event_enable/disable. Patch also adds the sysfs support for the MMCR3 SPR along with SPRN_ macros for these new pmu SPRs. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Rename to PPMU_ARCH_31 as noted by jpn] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-5-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Update Power PMU cache_events to u64 typeAthira Rajeev
Events of type PERF_TYPE_HW_CACHE was described for Power PMU as: int (*cache_events)[type][op][result]; where type, op, result values unpacked from the event attribute config value is used to generate the raw event code at runtime. So far the event code values which used to create these cache-related events were within 32 bit and `int` type worked. In power10, some of the event codes are of 64-bit value and hence update the Power PMU cache_events to `u64` type in `power_pmu` struct. Also propagate this change to existing all PMU driver code paths which are using ppmu->cache_events. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-4-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Update cpu_hw_event to use `struct` for storing MMCR registersAthira Rajeev
core-book3s currently uses array to store the MMCR registers as part of per-cpu `cpu_hw_events`. This patch does a clean up to use `struct` to store mmcr regs instead of array. This will make code easier to read and reduces chance of any subtle bug that may come in the future, say when new registers are added. Patch updates all relevant code that was using MMCR array ( cpuhw->mmcr[x]) to use newly introduced `struct`. This includes the PMU driver code for supported platforms (power5 to power9) and ISA macros for counter support functions. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-2-git-send-email-atrajeev@linux.vnet.ibm.com
2020-06-17maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofaultChristoph Hellwig
Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofaultChristoph Hellwig
Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-19powerpc: Use a datatype for instructionsJordan Niethe
Currently unsigned ints are used to represent instructions on powerpc. This has worked well as instructions have always been 4 byte words. However, ISA v3.1 introduces some changes to instructions that mean this scheme will no longer work as well. This change is Prefixed Instructions. A prefixed instruction is made up of a word prefix followed by a word suffix to make an 8 byte double word instruction. No matter the endianness of the system the prefix always comes first. Prefixed instructions are only planned for powerpc64. Introduce a ppc_inst type to represent both prefixed and word instructions on powerpc64 while keeping it possible to exclusively have word instructions on powerpc32. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Fix compile error in emulate_spe()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-12-jniethe5@gmail.com
2020-02-11perf/core: Add new branch sample type for HW index of raw branch recordsKan Liang
The low level index is the index in the underlying hardware buffer of the most recently captured taken branch which is always saved in branch_entries[0]. It is very useful for reconstructing the call stack. For example, in Intel LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers. With the low level index information, perf tool may stitch the stacks of two samples. The reconstructed LBR call stack can break the HW limitation. Add a new branch sample type to retrieve low level index of raw branch records. The low level index is between -1 (unknown) and max depth which can be retrieved in /sys/devices/cpu/caps/branches. Only when the new branch sample type is set, the low level index information is dumped into the PERF_SAMPLE_BRANCH_STACK output. Perf tool should check the attr.branch_sample_type, and apply the corresponding format for PERF_SAMPLE_BRANCH_STACK samples. Otherwise, some user case may be broken. For example, users may parse a perf.data, which include the new branch sample type, with an old version perf tool (without the check). Users probably get incorrect information without any warning. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200127165355.27495-2-kan.liang@linux.intel.com
2020-01-26powerpc: use probe_user_read() and probe_user_write()Christophe Leroy
Instead of opencoding, use probe_user_read() to failessly read a user location and probe_user_write() for writing to user. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e041f5eedb23f09ab553be8a91c3de2087147320.1579800517.git.christophe.leroy@c-s.fr
2019-10-17perf_event: Add support for LSM and SELinux checksJoel Fernandes (Google)
In current mainline, the degree of access to perf_event_open(2) system call depends on the perf_event_paranoid sysctl. This has a number of limitations: 1. The sysctl is only a single value. Many types of accesses are controlled based on the single value thus making the control very limited and coarse grained. 2. The sysctl is global, so if the sysctl is changed, then that means all processes get access to perf_event_open(2) opening the door to security issues. This patch adds LSM and SELinux access checking which will be used in Android to access perf_event_open(2) for the purposes of attaching BPF programs to tracepoints, perf profiling and other operations from userspace. These operations are intended for production systems. 5 new LSM hooks are added: 1. perf_event_open: This controls access during the perf_event_open(2) syscall itself. The hook is called from all the places that the perf_event_paranoid sysctl is checked to keep it consistent with the systctl. The hook gets passed a 'type' argument which controls CPU, kernel and tracepoint accesses (in this context, CPU, kernel and tracepoint have the same semantics as the perf_event_paranoid sysctl). Additionally, I added an 'open' type which is similar to perf_event_paranoid sysctl == 3 patch carried in Android and several other distros but was rejected in mainline [1] in 2016. 2. perf_event_alloc: This allocates a new security object for the event which stores the current SID within the event. It will be useful when the perf event's FD is passed through IPC to another process which may try to read the FD. Appropriate security checks will limit access. 3. perf_event_free: Called when the event is closed. 4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event. 5. perf_event_write: Called from the ioctl(2) syscalls for the event. [1] https://lwn.net/Articles/696240/ Since Peter had suggest LSM hooks in 2016 [1], I am adding his Suggested-by tag below. To use this patch, we set the perf_event_paranoid sysctl to -1 and then apply selinux checking as appropriate (default deny everything, and then add policy rules to give access to domains that need it). In the future we can remove the perf_event_paranoid sysctl altogether. Suggested-by: Peter Zijlstra <peterz@infradead.org> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: James Morris <jmorris@namei.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: rostedt@goodmis.org Cc: Yonghong Song <yhs@fb.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: jeffv@google.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: primiano@google.com Cc: Song Liu <songliubraving@fb.com> Cc: rsavitski@google.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: Matthew Garrett <matthewgarrett@google.com> Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org
2019-06-02Merge tag 'powerpc-5.2-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A minor fix to our IMC PMU code to print a less confusing error message when the driver can't initialise properly. A fix for a bug where a user requesting an unsupported branch sampling filter can corrupt PMU state, preventing the PMU from counting properly. And finally a fix for a bug in our support for kexec_file_load(), which prevented loading a kernel and initramfs. Most versions of kexec don't yet use kexec_file_load(). Thanks to: Anju T Sudhakar, Dave Young, Madhavan Srinivasan, Ravi Bangoria, Thiago Jung Bauermann" * tag 'powerpc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/kexec: Fix loading of kernel + initramfs with kexec_file_load() powerpc/perf: Fix MMCRA corruption by bhrb_filter powerpc/powernv: Return for invalid IMC domain
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22powerpc/perf: Fix MMCRA corruption by bhrb_filterRavi Bangoria
Consider a scenario where user creates two events: 1st event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY; fd = perf_event_open(attr, 0, 1, -1, 0); This sets cpuhw->bhrb_filter to 0 and returns valid fd. 2nd event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL; fd = perf_event_open(attr, 0, 1, -1, 0); It overrides cpuhw->bhrb_filter to -1 and returns with error. Now if power_pmu_enable() gets called by any path other than power_pmu_add(), ppmu->config_bhrb(-1) will set MMCRA to -1. Fixes: 3925f46bb590 ("powerpc/perf: Enable branch stack sampling framework") Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: Add generic compat mode pmu driverMadhavan Srinivasan
Most of the power processor generation performance monitoring unit (PMU) driver code is bundled in the kernel and one of those is enabled/registered based on the oprofile_cpu_type check at the boot. But things get little tricky incase of "compat" mode boot. IBM POWER System Server based processors has a compactibility mode feature, which simpily put is, Nth generation processor (lets say POWER8) will act and appear in a mode consistent with an earlier generation (N-1) processor (that is POWER7). And in this "compat" mode boot, kernel modify the "oprofile_cpu_type" to be Nth generation (POWER8). If Nth generation pmu driver is bundled (POWER8), it gets registered. Key dependency here is to have distro support for latest processor performance monitoring support. Patch here adds a generic "compat-mode" performance monitoring driver to be register in absence of powernv platform specific pmu driver. Driver supports only "cycles" and "instruction" events. "0x0001e" used as event code for "cycles" and "0x00002" used as event code for "instruction" events. New file called "generic-compat-pmu.c" is created to contain the driver specific code. And base raw event code format modeled on PPMU_ARCH_207S. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Use SPDX tag for license] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: init pmu from core-book3sMadhavan Srinivasan
Currenty pmu driver file for each ppc64 generation processor has a __init call in itself. Refactor the code by moving the __init call to core-books.c. This also clean's up compat mode pmu driver registration. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Use SPDX tag for license] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21Powerpc/perf: Wire up PMI throttlingRavi Bangoria
Commit 14c63f17b1fde ("perf: Drop sample rate when sampling is too slow") introduced a way to throttle PMU interrupts if we're spending too much time just processing those. Wire up powerpc PMI handler to use this infrastructure. We have throttling of the *rate* of interrupts, but this adds throttling based on the *time taken* to process the interrupts. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/perf: Add constraints for power9 l2/l3 bus eventsMadhavan Srinivasan
In previous generation processors, both bus events and direct events of performance monitoring unit can be individually programmabled and monitored in PMCs. But in Power9, L2/L3 bus events are always available as a "bank" of 4 events. To obtain the counts for any of the l2/l3 bus events in a given bank, the user will have to program PMC4 with corresponding l2/l3 bus event for that bank. Patch enforce two contraints incase of L2/L3 bus events. 1)Any L2/L3 event when programmed is also expected to program corresponding PMC4 event from that group. 2)PMC4 event should always been programmed first due to group constraint logic limitation For ex. consider these L3 bus events PM_L3_PF_ON_CHIP_MEM (0x460A0), PM_L3_PF_MISS_L3 (0x160A0), PM_L3_CO_MEM (0x260A0), PM_L3_PF_ON_CHIP_CACHE (0x360A0), 1) This is an INVALID group for L3 Bus event monitoring, since it is missing PMC4 event. perf stat -e "{r160A0,r260A0,r360A0}" < > And this is a VALID group for L3 Bus events: perf stat -e "{r460A0,r160A0,r260A0,r360A0}" < > 2) This is an INVALID group for L3 Bus event monitoring, since it is missing PMC4 event. perf stat -e "{r260A0,r360A0}" < > And this is a VALID group for L3 Bus events: perf stat -e "{r460A0,r260A0,r360A0}" < > 3) This is an INVALID group for L3 Bus event monitoring, since it is missing PMC4 event. perf stat -e "{r360A0}" < > And this is a VALID group for L3 Bus events: perf stat -e "{r460A0,r360A0}" < > Patch here implements group constraint logic suggested by Michael Ellerman. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/perf: Update perf_regs structure to include SIERMadhavan Srinivasan
On each sample, Sample Instruction Event Register (SIER) content is saved in pt_regs. SIER does not have a entry as-is in the pt_regs but instead, SIER content is saved in the "dar" register of pt_regs. Patch adds another entry to the perf_regs structure to include the "SIER" printing which internally maps to the "dar" of pt_regs. It also check for the SIER availability in the platform and present value accordingly Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-17Merge tag 'powerpc-4.19-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - A fix for a bug in our page table fragment allocator, where a page table page could be freed and reallocated for something else while still in use, leading to memory corruption etc. The fix reuses pt_mm in struct page (x86 only) for a powerpc only refcount. - Fixes to our pkey support. Several are user-visible changes, but bring us in to line with x86 behaviour and/or fix outright bugs. Thanks to Florian Weimer for reporting many of these. - A series to improve the hvc driver & related OPAL console code, which have been seen to cause hardlockups at times. The hvc driver changes in particular have been in linux-next for ~month. - Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y. - Remove Power8 DD1 and Power9 DD1 support, neither chip should be in use anywhere other than as a paper weight. - An optimised memcmp implementation using Power7-or-later VMX instructions - Support for barrier_nospec on some NXP CPUs. - Support for flushing the count cache on context switch on some IBM CPUs (controlled by firmware), as a Spectre v2 mitigation. - A series to enhance the information we print on unhandled signals to bring it into line with other arches, including showing the offending VMA and dumping the instructions around the fault. Thanks to: Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Alexey Spirkov, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Bartosz Golaszewski, Benjamin Herrenschmidt, Bharat Bhushan, Bjoern Noetel, Boqun Feng, Breno Leitao, Bryant G. Ly, Camelia Groza, Christophe Leroy, Christoph Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt, Darren Stevens, Dave Young, David Gibson, Diana Craciun, Finn Thain, Florian Weimer, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand, Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel Stanley, Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues, Michael Hanselmann, Michael Neuling, Michael Schmitz, Mukesh Ojha, Murilo Opsfelder Araujo, Nicholas Piggin, Parth Y Shah, Paul Mackerras, Paul Menzel, Ram Pai, Randy Dunlap, Rashmica Gupta, Reza Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff, Scott Wood, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat Rao, zhong jiang" * tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (234 commits) powerpc/mm/book3s/radix: Add mapping statistics powerpc/uaccess: Enable get_user(u64, *p) on 32-bit powerpc/mm/hash: Remove unnecessary do { } while(0) loop powerpc/64s: move machine check SLB flushing to mm/slb.c powerpc/powernv/idle: Fix build error powerpc/mm/tlbflush: update the mmu_gather page size while iterating address range powerpc/mm: remove warning about ‘type’ being set powerpc/32: Include setup.h header file to fix warnings powerpc: Move `path` variable inside DEBUG_PROM powerpc/powermac: Make some functions static powerpc/powermac: Remove variable x that's never read cxl: remove a dead branch powerpc/powermac: Add missing include of header pmac.h powerpc/kexec: Use common error handling code in setup_new_fdt() powerpc/xmon: Add address lookup for percpu symbols powerpc/mm: remove huge_pte_offset_and_shift() prototype powerpc/lib: Use patch_site to patch copy_32 functions once cache is enabled powerpc/pseries: Fix endianness while restoring of r3 in MCE handler. powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements powerpc/fadump: handle crash memory ranges array index overflow ...
2018-07-16powerpc/64s: Remove POWER9 DD1 supportNicholas Piggin
POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16perf, tools: Use correct articles in commentsTobias Tefke
Some of the comments in the perf events code use articles incorrectly, using 'a' for words beginning with a vowel sound, where 'an' should be used. Signed-off-by: Tobias Tefke <tobias.tefke@tutanota.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: jolsa@redhat.com Cc: namhyung@kernel.org Link: http://lkml.kernel.org/r/20180709105715.22938-1-tobias.tefke@tutanota.com [ Fix a few more perf related 'a event' typo fixes from all around the kernel and tooling tree. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-07Merge tag 'powerpc-4.17-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Support for 4PB user address space on 64-bit, opt-in via mmap(). - Removal of POWER4 support, which was accidentally broken in 2016 and no one noticed, and blocked use of some modern instructions. - Workarounds so that the hypervisor can enable Transactional Memory on Power9. - A series to disable the DAWR (Data Address Watchpoint Register) on Power9. - More information displayed in the meltdown/spectre_v1/v2 sysfs files. - A vpermxor (Power8 Altivec) implementation for the raid6 Q Syndrome. - A big series to make the allocation of our pacas (per cpu area), kernel page tables, and per-cpu stacks NUMA aware when using the Radix MMU on Power9. And as usual many fixes, reworks and cleanups. Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook, Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe, Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring, Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun" * tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits) powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep powerpc/64s: Fix POWER9 DD2.2 and above in cputable features powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead" powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo} powerpc: io.h: move iomap.h include so that it can use readq/writeq defs cxl: Fix possible deadlock when processing page faults from cxllib powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it powerpc/mm/radix: Update command line parsing for disable_radix powerpc/mm/radix: Parse disable_radix commandline correctly. powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix powerpc/mm/keys: Update documentation and remove unnecessary check powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop() powerpc/powernv: Always stop secondaries before reboot/shutdown powerpc: hard disable irqs in smp_send_stop loop powerpc: use NMI IPI for smp_send_stop powerpc/powernv: Fix SMT4 forcing idle code ...
2018-03-27powerpc/perf: Infrastructure to support addition of blacklisted eventsMadhavan Srinivasan
Introduce code to support addition of blacklisted events for a processor version. Blacklisted events are events that are known to not count correctly on that CPU revision, and so should be prevented from being counted so as to avoid user confusion. A 'pointer' and 'int' variable to hold the number of events are added to 'struct power_pmu', along with a generic function to loop through the list to validate the given event. Generic function 'is_event_blacklisted' is called in power_pmu_event_init() to detect and reject early. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/perf: Prevent kernel address leak via perf_get_data_addr()Madhavan Srinivasan
Sampled Data Address Register (SDAR) is a 64-bit register that contains the effective address of the storage operand of an instruction that was being executed, possibly out-of-order, at or around the time that the Performance Monitor alert occurred. In certain scenario SDAR happen to contain the kernel address even for userspace only sampling. Add checks to prevent it. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/perf: Prevent kernel address leak to userspace via BHRB bufferMadhavan Srinivasan
The current Branch History Rolling Buffer (BHRB) code does not check for any privilege levels before updating the data from BHRB. This could leak kernel addresses to userspace even when profiling only with userspace privileges. Add proper checks to prevent it. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/perf: Fix kernel address leak via sampling registersMichael Ellerman
Current code in power_pmu_disable() does not clear the sampling registers like Sampling Instruction Address Register (SIAR) and Sampling Data Address Register (SDAR) after disabling the PMU. Since these are userspace readable and could contain kernel addresses, add code to explicitly clear the content of these registers. Also add a "context synchronizing instruction" to enforce no further updates to these registers as suggested by Power ISA v3.0B. From section 9.4, on page 1108: "If an mtspr instruction is executed that changes the value of a Performance Monitor register other than SIAR, SDAR, and SIER, the change is not guaranteed to have taken effect until after a subsequent context synchronizing instruction has been executed (see Chapter 11. "Synchronization Requirements for Context Alterations" on page 1133)." Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Massage change log and add ISA reference] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-16perf: Fix sibling iterationPeter Zijlstra
Mark noticed that the change to sibling_list changed some iteration semantics; because previously we used group_list as list entry, sibling events would always have an empty sibling_list. But because we now use sibling_list for both list head and list entry, siblings will report as having siblings. Fix this with a custom for_each_sibling_event() iterator. Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry") Reported-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: vincent.weaver@maine.edu Cc: alexander.shishkin@linux.intel.com Cc: torvalds@linux-foundation.org Cc: alexey.budankov@linux.intel.com Cc: valery.cherepennikov@intel.com Cc: eranian@google.com Cc: acme@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: davidcc@google.com Cc: kan.liang@intel.com Cc: Dmitry.Prohorov@intel.com Cc: jolsa@redhat.com Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net
2018-03-12perf/core: Remove perf_event::group_entryPeter Zijlstra
Now that all the grouping is done with RB trees, we no longer need group_entry and can replace the whole thing with sibling_list. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valery Cherepennikov <valery.cherepennikov@intel.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-21Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch from the 4.15 cycle. Unusually the fixes branch saw some significant features merged, notably the RFI flush patches, so we want the code in next to be tested against that, to avoid any surprises when the two are merged. There's also some other work on the panic handling that was reverted in fixes and we now want to do properly in next, which would conflict. And we also fix a few other minor merge conflicts.
2018-01-19powerpc/64: Change soft_enabled from flag to bitmaskMadhavan Srinivasan
"paca->soft_enabled" is used as a flag to mask some of interrupts. Currently supported flags values and their details: soft_enabled MSR[EE] 0 0 Disabled (PMI and HMI not masked) 1 1 Enabled "paca->soft_enabled" is initialized to 1 to make the interripts as enabled. arch_local_irq_disable() will toggle the value when interrupts needs to disbled. At this point, the interrupts are not actually disabled, instead, interrupt vector has code to check for the flag and mask it when it occurs. By "mask it", it update interrupt paca->irq_happened and return. arch_local_irq_restore() is called to re-enable interrupts, which checks and replays interrupts if any occured. Now, as mentioned, current logic doesnot mask "performance monitoring interrupts" and PMIs are implemented as NMI. But this patchset depends on local_irq_* for a successful local_* update. Meaning, mask all possible interrupts during local_* update and replay them after the update. So the idea here is to reserve the "paca->soft_enabled" logic. New values and details: soft_enabled MSR[EE] 1 0 Disabled (PMI and HMI not masked) 0 1 Enabled Reason for the this change is to create foundation for a third mask value "0x2" for "soft_enabled" to add support to mask PMIs. When ->soft_enabled is set to a value "3", PMI interrupts are mask and when set to a value of "1", PMI are not mask. With this patch also extends soft_enabled as interrupt disable mask. Current flags are renamed from IRQ_[EN?DIS}ABLED to IRQS_ENABLED and IRQS_DISABLED. Patch also fixes the ptrace call to force the user to see the softe value to be alway 1. Reason being, even though userspace has no business knowing about softe, it is part of pt_regs. Like-wise in signal context. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64: Add #defines for paca->soft_enabled flagsMadhavan Srinivasan
Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when updating paca->soft_enabled. Replace the hardcoded values used when updating paca->soft_enabled with IRQ_(EN|DIS)ABLED #define. No logic change. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-13powerpc/perf: Dereference BHRB entries safelyRavi Bangoria
It's theoretically possible that branch instructions recorded in BHRB (Branch History Rolling Buffer) entries have already been unmapped before they are processed by the kernel. Hence, trying to dereference such memory location will result in a crash. eg: Unable to handle kernel paging request for data at address 0xd000000019c41764 Faulting instruction address: 0xc000000000084a14 NIP [c000000000084a14] branch_target+0x4/0x70 LR [c0000000000eb828] record_and_restart+0x568/0x5c0 Call Trace: [c0000000000eb3b4] record_and_restart+0xf4/0x5c0 (unreliable) [c0000000000ec378] perf_event_interrupt+0x298/0x460 [c000000000027964] performance_monitor_exception+0x54/0x70 [c000000000009ba4] performance_monitor_common+0x114/0x120 Fix it by deferefencing the addresses safely. Fixes: 691231846ceb ("powerpc/perf: Fix setting of "to" addresses for BHRB") Cc: stable@vger.kernel.org # v3.10+ Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Use probe_kernel_read() which is clearer, tweak change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-04powerpc/perf: Fix oops when grouping different pmu eventsRavi Bangoria
When user tries to group imc (In-Memory Collections) event with normal event, (sometime) kernel crashes with following log: Faulting instruction address: 0x00000000 [link register ] c00000000010ce88 power_check_constraints+0x128/0x980 ... c00000000010e238 power_pmu_event_init+0x268/0x6f0 c0000000002dc60c perf_try_init_event+0xdc/0x1a0 c0000000002dce88 perf_event_alloc+0x7b8/0xac0 c0000000002e92e0 SyS_perf_event_open+0x530/0xda0 c00000000000b004 system_call+0x38/0xe0 'event_base' field of 'struct hw_perf_event' is used as flags for normal hw events and used as memory address for imc events. While grouping these two types of events, collect_events() tries to interpret imc 'event_base' as a flag, which causes a corruption resulting in a crash. Consider only those events which belongs to 'perf_hw_context' in collect_events(). Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-By: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20powerpc/sysrq: Fix oops whem ppmu is not registeredRavi Bangoria
Kernel crashes if power pmu is not registered and user tries to dump regs with 'echo p > /proc/sysrq-trigger'. Sample log: Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc0000000000d52f0 NIP [c0000000000d52f0] perf_event_print_debug+0x10/0x230 LR [c00000000058a938] sysrq_handle_showregs+0x38/0x50 Call Trace: printk+0x38/0x4c (unreliable) __handle_sysrq+0xe4/0x270 write_sysrq_trigger+0x64/0x80 proc_reg_write+0x80/0xd0 __vfs_write+0x40/0x200 vfs_write+0xc8/0x240 SyS_write+0x60/0x110 system_call+0x58/0x6c Fixes: 5f6d0380c640 ("powerpc/perf: Define perf_event_print_debug() to print PMU register values") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>