summaryrefslogtreecommitdiff
path: root/drivers/perf
AgeCommit message (Collapse)Author
4 daysMerge tag 'riscv-for-linus-6.18-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley - Replacement of __ASSEMBLY__ with __ASSEMBLER__ in header files (other architectures have already merged this type of cleanup) - The introduction of ioremap_wc() for RISC-V - Cleanup of the RISC-V kprobes code to use mostly-extant macros rather than open code - A RISC-V kprobes unit test - An architecture-specific endianness swap macro set implementation, leveraging some dedicated RISC-V instructions for this purpose if they are available - The ability to identity and communicate to userspace the presence of a MIPS P8700-specific ISA extension, and to leverage its MIPS-specific PAUSE implementation in cpu_relax() - Several other miscellaneous cleanups * tag 'riscv-for-linus-6.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (39 commits) riscv: errata: Fix the PAUSE Opcode for MIPS P8700 riscv: hwprobe: Document MIPS xmipsexectl vendor extension riscv: hwprobe: Add MIPS vendor extension probing riscv: Add xmipsexectl instructions riscv: Add xmipsexectl as a vendor extension dt-bindings: riscv: Add xmipsexectl ISA extension description riscv: cpufeature: add validation for zfa, zfh and zfhmin perf: riscv: skip empty batches in counter start selftests: riscv: Add README for RISC-V KSelfTest riscv: sbi: Switch to new sys-off handler API riscv: Move vendor errata definitions to new header RISC-V: ACPI: enable parsing the BGRT table riscv: Enable ARCH_HAVE_NMI_SAFE_CMPXCHG riscv: pi: use 'targets' instead of extra-y in Makefile riscv: introduce asm/swab.h riscv: mmap(): use unsigned offset type in riscv_sys_mmap drivers/perf: riscv: Remove redundant ternary operators riscv: mm: Use mmu-type from FDT to limit SATP mode riscv: mm: Return intended SATP mode for noXlvl options riscv: kprobes: Remove duplication of RV_EXTRACT_ITYPE_IMM ...
10 daysperf/dwc_pcie: Fix use of uninitialized variableIlkka Koskinen
Fix use of uninitialized variable in group validation code. Fixes: 71396cfac97d ("perf/dwc_pcie: Support counting multiple lane events in parallel") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202509231223.gZsX6Eio-lkp@intel.com/ Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>
12 daysdrivers/perf: hisi: Add support for L3C PMU v3Yicong Yang
This patch adds support for L3C PMU v3. The v3 L3C PMU supports an extended events space which can be controlled in up to 2 extra address spaces with separate overflow interrupts. The layout of the control/event registers are kept the same. The extended events with original ones together cover the monitoring job of all transactions on L3C. The extended events is specified with `ext=[1|2]` option for the driver to distinguish, like below: perf stat -e hisi_sccl0_l3c0_0/event=<event_id>,ext=1/ Currently only event option using config bit [7, 0]. There's still plenty unused space. Make ext using config [16, 17] and reserve bit [15, 8] for event option for future extension. With the capability of extra counters, number of counters for HiSilicon uncore PMU could reach up to 24, the usedmap is extended accordingly. The hw_perf_event::event_base is initialized to the base MMIO address of the event and will be used for later control, overflow handling and counts readout. We still make use of the Uncore PMU framework for handling the events and interrupt migration on CPU hotplug. The framework's cpuhp callback will handle the event migration and interrupt migration of orginial event, if PMU supports extended events then the interrupt of extended events is migrated to the same CPU choosed by the framework. A new HID of HISI0215 is used for this version of L3C PMU. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Co-developed-by: Yushan Wang <wangyushan12@huawei.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>
12 daysdrivers/perf: hisi: Refactor the event configuration of L3C PMUYicong Yang
The event register is configured using hisi_pmu::base directly since only one address space is supported for L3C PMU. We need to extend if events configuration locates in different address space. In order to make preparation for such hardware, extract the event register configuration to separate function using hw_perf_event::event_base as each event's base address. Implement a private hisi_uncore_ops::get_event_idx() callback for initialize the event_base besides get the hardware index. No functional changes intended. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>
12 daysdrivers/perf: hisi: Extend the field of tt_coreYicong Yang
Currently the tt_core's using config1's bit [7, 0] and can not be extended. For some platforms there's more the 8 CPUs sharing the L3 cache. So make tt_core use config2's bit [15, 0] and the remaining bits in config2 is reserved for extension. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>
12 daysdrivers/perf: hisi: Extract the event filter check of L3C PMUYicong Yang
L3C PMU has 4 filter options which are sharing perf_event_attr::config1. Driver will check config1 to see whether a certain event has a filter setting. It'll be incorrect if we make use of other bits in config1 for non-filter options. So check whether each filter options are set directly in a separate function instead. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>
12 daysdrivers/perf: hisi: Simplify the probe process of each L3C PMU versionYicong Yang
Version 1 and 2 of L3C PMU also use different HID. Make use of struct acpi_device_id::driver_data for version specific information rather than judge the version register. This will help to simplify the probe process and also a bit easier for extension. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>
12 daysdrivers/perf: hisi: Export hisi_uncore_pmu_isr()Yicong Yang
Currently Uncore PMU framework assume one PMU device only have one interrupt and will help register the interrupt handler. It cannot support a PMU with multiple interrupt resources. An uncore PMU may have multiple interrupts that can share the same handler. Export hisi_uncore_pmu_isr() to allow drivers register the irq handler by their own routine. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>
12 daysdrivers/perf: hisi: Relax the event ID check in the frameworkYicong Yang
Event ID is only using the attr::config bit [7, 0] but we check the event range using the whole 64bit field. It blocks the usage of the rest field of attr::config. Relax the check by only using the bit [7, 0]. Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yushan Wang <wangyushan12@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>
12 daysperf: Fujitsu: Add the Uncore PMU driverKoichi Okuno
This adds a new dynamic PMU to the Perf Events framework to program and control the Uncore PMUs in Fujitsu chips. This driver exports formatting and event information to sysfs so it can be used by the perf user space tools with the syntaxes: perf stat -e pci_iod0_pci0/ea-pci/ ls perf stat -e pci_iod0_pci0/event=0x80/ ls perf stat -e mac_iod0_mac0_ch0/ea-mac/ ls perf stat -e mac_iod0_mac0_ch0/event=0x80/ ls FUJITSU-MONAKA PMU Events Specification v1.1 URL: https://github.com/fujitsu/FUJITSU-MONAKA Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Koichi Okuno <fj2767dz@fujitsu.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18perf: riscv: skip empty batches in counter startYunhui Cui
Avoid unnecessary SBI calls when starting non-overflowed counters in pmu_sbi_start_ovf_ctrs_sbi() by checking ctr_start_mask. Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250804025110.11088-1-cuiyunhui@bytedance.com Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-18perf/arm-cmn: Fix CMN S3 DTM offsetRobin Murphy
CMN S3's DTM offset is different between r0px and r1p0, and it turns out this was not a error in the earlier documentation, but does actually exist in the design. Lovely. Cc: stable@vger.kernel.org Fixes: 0dc2f4963f7e ("perf/arm-cmn: Support CMN S3") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18perf: arm_spe: Prevent overflow in PERF_IDX2OFF()Leo Yan
Cast nr_pages to unsigned long to avoid overflow when handling large AUX buffer sizes (>= 2 GiB). Fixes: d5d9696b0380 ("drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension") Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18drivers/perf: riscv: Remove redundant ternary operatorsLiao Yuanhong
For ternary operators in the form of "a ? true : false", if 'a' itself returns a boolean result, the ternary operator can be omitted. Remove redundant ternary operators to clean up the code. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250828122510.30843-1-liaoyuanhong@vivo.com Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-18drivers/perf: hisi: Add support for HiSilicon MN PMU driverJunhao He
MN (Miscellaneous Node) is a hybrid node in ARM CHI. It broadcasts the following two types of requests: DVM operations and PCIe configuration. MN PMU devices exist on both SCCL and SICL, so we named the MN pmu driver after SCL (Super cluster) ID. The MN PMU driver using the HiSilicon uncore PMU framework. And only the event parameter is supported. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18drivers/perf: hisi: Add support for HiSilicon NoC PMUYicong Yang
Adds the support for HiSilicon NoC (Network on Chip) PMU which will be used to monitor the events on the system bus. The PMU device will be named after the SCL ID (either Super CPU cluster or Super IO cluster) and the index ID, just similar to other HiSilicon Uncore PMUs. Below PMU formats are provided besides the event: - ch: the transaction channel (data, request, response, etc) which can be used to filter the counting. - tt_en: tracetag filtering enable. Just as other HiSilicon Uncore PMUs the NoC PMU supports only counting the transactions with tracetag. The NoC PMU doesn't have an interrupt to indicate the overflow. However we have a 64 bit counter which is large enough and it's nearly impossible to overflow. Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditionsYicong Yang
PMCCNTR_EL0 is preferred for counting CPU_CYCLES under certain conditions. Factor out the condition check to a separate function for further extension. Add documents for better understanding. No functional changes intended. Reviewed-by: James Clark <james.clark@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18perf: arm_spe: Add support for FEAT_SPE_EFT extended filteringJames Clark
FEAT_SPE_EFT (optional from Armv9.4) adds mask bits for the existing load, store and branch filters. It also adds two new filter bits for SIMD and floating point with their own associated mask bits. The current filters only allow OR filtering on samples that are load OR store etc, and the new mask bits allow setting part of the filter to an AND, for example filtering samples that are store AND SIMD. With mask bits set to 0, the OR behavior is preserved, so the unless any masks are explicitly set old filters will behave the same. Add them all and make them behave the same way as existing format bits, hidden and return EOPNOTSUPP if set when the feature doesn't exist. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18perf: arm_spe: Expose event filterLeo Yan
Expose an "event_filter" entry in the caps folder to inform user space about which events can be filtered. Change the return type of arm_spe_pmu_cap_get() from u32 to u64 to accommodate the added event filter entry. Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18perf: arm_spe: Support FEAT_SPEv1p4 filtersJames Clark
FEAT_SPEv1p4 (optional from Armv8.8) adds some new filter bits and also makes some previously available bits unavailable again e.g: E[30], bit [30] When FEAT_SPEv1p4 is _not_ implemented ... Continuing to hard code the valid filter bits for each version isn't scalable, and it also doesn't work for filter bits that aren't related to SPE version. For example most bits have a further condition: E[15], bit [15] When ... and filtering on event 15 is supported: Whether "filtering on event 15" is implemented or not is only discoverable from the TRM of that specific CPU or by probing PMSEVFR_EL1. Instead of hard coding them, write all 1s to the PMSEVFR_EL1 register and read it back to discover the RES0 bits. Unsupported bits are RAZ/WI so should read as 0s. For any hardware that doesn't strictly follow RAZ/WI for unsupported filters: Any bits that should have been supported in a specific SPE version but now incorrectly appear to be RES0 wouldn't have worked anyway, so it's better to fail to open events that request them rather than behaving unexpectedly. Bits that aren't implemented but also aren't RAZ/WI will be incorrectly reported as supported, but allowing them to be used is harmless. Testing on N1SDP shows the probed RES0 bits to be the same as the hard coded ones. The FVP with SPEv1p4 shows only additional new RES0 bits, i.e. no previously hard coded RES0 bits are missing. Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18perf/dwc_pcie: Support counting multiple lane events in parallelIlkka Koskinen
While Designware PCIe PMU allows to count only one time based event at a time, it allows to count all the lane events simultaneously. After the patch one is able to count a group of lane events: $ perf stat -e '{dwc_rootport/tx_memory_write,lane=1/,dwc_rootport/rx_memory_read,lane=0/}' dd if=/dev/nvme0n1 of=/dev/null bs=1M count=1 Earlier the events wouldn't have been counted successfully. Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18drivers: perf: use us_to_ktime() where appropriateXichao Zhao
The arm_ccn_pmu_poll_period_us are more suitable for using the us_to_ktime(). This can make the code more concise and enhance readability. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18perf: imx_perf: add support for i.MX94 platformXu Yang
Add compatible string and related devtype for i.MX94 platform. Reviewed-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14drivers/perf: hisi: Support PMUs with no interruptYicong Yang
We'll have PMUs don't have an interrupt to indicate the counter overflow, but the Uncore PMU core assume all the PMUs have interrupt. So handle this case in the core. The existing PMUs won't be affected. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250619125557.57372-7-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14drivers/perf: hisi: Relax the event number check of v2 PMUsJunhao He
The supported event number range of each Uncore PMUs is provided by each driver in hisi_pmu::check_event and out of range events will be rejected. A later version with expanded event number range needs to register the PMU with updated hisi_pmu::check_event even if it's the only update, which means the expanded events cannot be used unless the driver's updated. However the unsupported events won't be counted by the hardware so we can relax the event number check to allow the use the expanded events. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Junhao He <hejunhao3@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250619125557.57372-6-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driverJunhao He
SLLC v3 PMU has the following changes compared to previous version: a) update the register layout b) update the definition of SRCID_CTRL and TGTID_CTRL registers. To be compatible with v2, we use maximum width (11 bits) and mask the extra length for themselves. c) remove latency events (driver does not need to be adapted). SLLC v3 PMU is identified with HID HISI0264. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250619125557.57372-5-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU informationJunhao He
Make use of struct acpi_device_id::driver_data for version specific information rather than judge the version register. This will help to simplify the probe process and also a bit easier for extension. Factor out SLLC register definition to struct hisi_sllc_pmu_regs. No functional changes intended. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Junhao He <hejunhao3@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250619125557.57372-4-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driverJunhao He
HiSilicon DDRC v3 PMU has the different interrupt register offset compared to the v2. Add device information of v3 PMU with ACPI HID HISI0235. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250619125557.57372-3-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14drivers/perf: hisi: Simplify the probe process for each DDRC versionJunhao He
Version 1 and 2 of DDRC PMU also use different HID. Make use of struct acpi_device_id::driver_data for version specific information rather than judge the version register. This will help to simplify the probe process and also a bit easier for extension. In order to support this extend struct hisi_pmu_dev_info for version specific counter bits and event range. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250619125557.57372-2-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14perf/arm-ni: Support sharing IRQs within an NI instanceShouping Wang
NI-700 has a distinct PMU interrupt output for each Clock Domain, however some integrations may still combine these together externally. The initial driver didn't attempt to support this, in anticipation of a more general solution for IRQ sharing between system PMU instances, but that's still a way off, so let's make this intermediate step for now to at least allow sharing IRQs within an individual NI instance. Now that CPU affinity and migration are cleaned up, it's fairly straightforward to adopt similar logic to arm-cmn, to identify CDs with a common interrupt and loop over them directly in the handler. Signed-off-by: Shouping Wang <allen.wang@hj-micro.com> [ rm: Rework for affinity handling, cosmetics, new commit message ] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/f62db639d3b54c959ec477db7b8ccecbef1ca310.1752256072.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14perf/arm-ni: Consolidate CPU affinity handlingRobin Murphy
Since overflow interrupts from the individual PMUs are infrequent and unlikely to coincide, and we make no attempt to balance them across CPUs anyway, there's really not much point tracking a separate CPU affinity per PMU. Move the CPU affinity and hotplug migration up to the NI instance level. Tested-by: Shouping Wang <allen.wang@hj-micro.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/00b622872006c2f0c89485e343b1cb8caaa79c47.1752256072.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentationAlok Tiwari
Fix several minor typo errors in comments: - Remove duplicated word "a" in "a a VID / GroupID". - Correct "Opcopdes" to "Opcodes" in CXL spec reference. - Fix spelling of "implemnted" to "implemented". Improves code readability and documentation consistency. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://lore.kernel.org/r/20250624194350.109790-4-alok.a.tiwari@oracle.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14perf/cxlpmu: Remove unintended newline from IRQ name format stringAlok Tiwari
The IRQ name format string used in devm_kasprintf() mistakenly included a newline character "\n". This could lead to confusing log output or misformatted names in sysfs or debug messages. This fix removes the newline to ensure proper IRQ naming. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://lore.kernel.org/r/20250624194350.109790-3-alok.a.tiwari@oracle.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe()Alok Tiwari
The previous code mistakenly swapped the count and size parameters. This fix corrects the argument order in devm_kcalloc() to follow the conventional count, size form, avoiding potential confusion or bugs. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://lore.kernel.org/r/20250624194350.109790-2-alok.a.tiwari@oracle.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08perf: arm_spe: Relax period restrictionLeo Yan
The minimum interval specified the PMSIDR_EL1.Interval field is a hardware recommendation. However, this value is set by hardware designer before the production. It is not actual hardware limitation but tools currently have no way to test shorter periods. This change relaxes the limitation by allowing any non-zero periods, with simplifying code with clamp_t(). The downside is that small periods may increase the risk of AUX ring buffer overruns. When an overrun occurs, the perf core layer will trigger an irq work to disable the event and wake up the tool in user space to read the trace data. After the tool finishes reading, it will re-enable the AUX event. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250627163028.3503122-1-leo.yan@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)Rob Herring (Arm)
The ARMv9.2 architecture introduces the optional Branch Record Buffer Extension (BRBE), which records information about branches as they are executed into set of branch record registers. BRBE is similar to x86's Last Branch Record (LBR) and PowerPC's Branch History Rolling Buffer (BHRB). BRBE supports filtering by exception level and can filter just the source or target address if excluded to avoid leaking privileged addresses. The h/w filter would be sufficient except when there are multiple events with disjoint filtering requirements. In this case, BRBE is configured with a union of all the events' desired branches, and then the recorded branches are filtered based on each event's filter. For example, with one event capturing kernel events and another event capturing user events, BRBE will be configured to capture both kernel and user branches. When handling event overflow, the branch records have to be filtered by software to only include kernel or user branch addresses for that event. In contrast, x86 simply configures LBR using the last installed event which seems broken. It is possible on x86 to configure branch filter such that no branches are ever recorded (e.g. -j save_type). For BRBE, events with a configuration that will result in no samples are rejected. Recording branches in KVM guests is not supported like x86. However, perf on x86 allows requesting branch recording in guests. The guest events are recorded, but the resulting branches are all from the host. For BRBE, events with branch recording and "exclude_host" set are rejected. Requiring "exclude_guest" to be set did not work. The default for the perf tool does set "exclude_guest" if no exception level options are specified. However, specifying kernel or user events defaults to including both host and guest. In this case, only host branches are recorded. BRBE can support some additional exception branch types compared to x86. On x86, all exceptions other than syscalls are recorded as IRQ. With BRBE, it is possible to better categorize these exceptions. One limitation relative to x86 is we cannot distinguish a syscall return from other exception returns. So all exception returns are recorded as ERET type. The FIQ branch type is omitted as the only FIQ user is Apple platforms which don't support BRBE. The debug branch types are omitted as there is no clear need for them. BRBE records are invalidated whenever events are reconfigured, a new task is scheduled in, or after recording is paused (and the records have been recorded for the event). The architecture allows branch records to be invalidated by the PE under implementation defined conditions. It is expected that these conditions are rare. Cc: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> tested-by: Adam Young <admiyo@os.amperecomputing.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-4-e7775563036e@kernel.org [will: Fix sparse warnings about mixed declarations and code. Fix C99 comment syntax.] Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04perf/arm: Add missing .suppress_bind_attrsRobin Murphy
PMU drivers should set .suppress_bind_attrs so that userspace is denied the opportunity to pull the driver out from underneath an in-use PMU (with predictably unpleasant consequences). Somehow both the CMN and NI drivers have managed to miss this; put that right. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/acd48c341b33b96804a3969ee00b355d40c546e2.1751465293.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04perf/arm-cmn: Reduce stack usage during discoveryRobin Murphy
Arnd reports that Clang's aggressive inlining of arm_cmn_discover() can lead to stack frame size warnings, and while we could simply prevent such inlining to hide the issue, it seems more productive to actually heed the warning and do something about the overall stack footprint. The xp_region array is already rather large, and CMN_MAX_XPS might only grow larger in future, however it only serves as a convenience to save repeating the first level's worth of register reads in the second pass of discovery. There's no performance concern here, and it only takes a small tweak to the flow to re-extract the offsets instead of stashing them, so let's just do that and save several hundred bytes of stack. Reported-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Link: https://lore.kernel.org/r/e7dd41bf0f1b098e2e4b01ef91318a4b272abff8.1751046159.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04perf: imx9_perf: make the read-only array mask static constColin Ian King
Don't populate the read-only array mask on the stack at run time, instead make it static const. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250611133917.170888-1-colin.i.king@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04perf/arm-cmn: Broaden module description for wider interconnect supportZhiyuan Dai
The current MODULE_DESCRIPTION only mentions CMN-600, but this driver now supports several Arm mesh interconnects including CMN-650, CMN-700, CI-700, and CMN-S3. Update the MODULE_DESCRIPTION to reflect the expanded scope. Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn> Link: https://lore.kernel.org/r/20250522032122.949373-1-daizhiyuan@phytium.com.cn Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04perf/arm-ni: Set initial IRQ affinityRobin Murphy
While we do request our IRQs with the right flags to stop their affinity changing unexpectedly, we forgot to actually set it to start with. Oops. Cc: stable@vger.kernel.org Fixes: 4d5a7680f2b4 ("perf: Add driver for Arm NI-700 interconnect PMU") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Shouping Wang <allen.wang@hj-micro.com> Link: https://lore.kernel.org/r/614ced9149ee8324e58930862bd82cbf46228d27.1747149165.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-28Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The headline feature is the re-enablement of support for Arm's Scalable Matrix Extension (SME) thanks to a bumper crop of fixes from Mark Rutland. If matrices aren't your thing, then Ryan's page-table optimisation work is much more interesting. Summary: ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls - Fix a node refcount imbalance in the PSCI device-tree code CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4 - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code Entry code: - Hook up TIF_NEED_RESCHED_LAZY so that CONFIG_PREEMPT_LAZY can be selected Memory management: - Prevent BSS exports from being used by the early PI code - Propagate level and stride information to the low-level TLB invalidation routines when operating on hugetlb entries - Use the page-table contiguous hint for vmap() mappings with VM_ALLOW_HUGE_VMAP where possible - Optimise vmalloc()/vmap() page-table updates to use "lazy MMU mode" and hook this up on arm64 so that the trailing DSB (used to publish the updates to the hardware walker) can be deferred until the end of the mapping operation - Extend mmap() randomisation for 52-bit virtual addresses (on par with 48-bit addressing) and remove limited support for randomisation of the linear map Perf and PMUs: - Add support for probing the CMN-S3 driver using ACPI - Minor driver fixes to the CMN, Arm-NI and amlogic PMU drivers Selftests: - Fix FPSIMD and SME tests to align with the freshly re-enabled SME support - Fix default setting of the OUTPUT variable so that tests are installed in the right location vDSO: - Replace raw counter access from inline assembly code with a call to the the __arch_counter_get_cntvct() helper function Miscellaneous: - Add some missing header inclusions to the CCA headers - Rework rendering of /proc/cpuinfo to follow the x86-approach and avoid repeated buffer expansion (the user-visible format remains identical) - Remove redundant selection of CONFIG_CRC32 - Extend early error message when failing to map the device-tree blob" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits) arm64: cputype: Add cputype definition for HIP12 arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again perf/arm-cmn: Add CMN S3 ACPI binding arm64/boot: Disallow BSS exports to startup code arm64/boot: Move global CPU override variables out of BSS arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace perf/arm-cmn: Initialise cmn->cpu earlier kselftest/arm64: Set default OUTPUT path when undefined arm64: Update comment regarding values in __boot_cpu_mode arm64: mm: Drop redundant check in pmd_trans_huge() arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1 arm64/mm: Permit lazy_mmu_mode to be nested arm64/mm: Disable barrier batching in interrupt contexts arm64/cpuinfo: only show one cpu's info in c_show() arm64/mm: Batch barriers when updating kernel mappings mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes arm64/mm: Support huge pte-mapped pages in vmap mm/vmalloc: Gracefully unmap huge ptes mm/vmalloc: Warn on improper use of vunmap_range() arm64/mm: Hoist barriers out of set_ptes_anysz() loop ...
2025-05-21perf/apple_m1: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-10-kan.liang@linux.intel.com
2025-05-21perf/arm: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250520181644.2673067-9-kan.liang@linux.intel.com
2025-05-19perf/arm-cmn: Add CMN S3 ACPI bindingRobin Murphy
An ACPI binding for CMN S3 was not yet finalised when the driver support was originally written, but v1.2 of DEN0093 "ACPI for Arm Components" has at last been published; support ACPI systems using the proper HID. Cc: stable@vger.kernel.org Fixes: 0dc2f4963f7e ("perf/arm-cmn: Support CMN S3") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/7dafe147f186423020af49d7037552ee59c60e97.1747652164.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-16perf/arm-cmn: Initialise cmn->cpu earlierRobin Murphy
For all the complexity of handling affinity for CPU hotplug, what we've apparently managed to overlook is that arm_cmn_init_irqs() has in fact always been setting the *initial* affinity of all IRQs to CPU 0, not the CPU we subsequently choose for event scheduling. Oh dear. Cc: stable@vger.kernel.org Fixes: 0ba64770a2f2 ("perf: Add Arm CMN-600 PMU driver") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Link: https://lore.kernel.org/r/b12fccba6b5b4d2674944f59e4daad91cd63420b.1747069914.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-09perf/amlogic: Replace smp_processor_id() with raw_smp_processor_id() in ↵Anand Moon
meson_ddr_pmu_create() The Amlogic DDR PMU driver meson_ddr_pmu_create() function incorrectly uses smp_processor_id(), which assumes disabled preemption. This leads to kernel warnings during module loading because meson_ddr_pmu_create() can be called in a preemptible context. Following kernel warning and stack trace: [ 31.745138] [ T2289] BUG: using smp_processor_id() in preemptible [00000000] code: (udev-worker)/2289 [ 31.745154] [ T2289] caller is debug_smp_processor_id+0x28/0x38 [ 31.745172] [ T2289] CPU: 4 UID: 0 PID: 2289 Comm: (udev-worker) Tainted: GW 6.14.0-0-MANJARO-ARM #1 59519addcbca6ba8de735e151fd7b9e97aac7ff0 [ 31.745181] [ T2289] Tainted: [W]=WARN [ 31.745183] [ T2289] Hardware name: Hardkernel ODROID-N2Plus (DT) [ 31.745188] [ T2289] Call trace: [ 31.745191] [ T2289] show_stack+0x28/0x40 (C) [ 31.745199] [ T2289] dump_stack_lvl+0x4c/0x198 [ 31.745205] [ T2289] dump_stack+0x20/0x50 [ 31.745209] [ T2289] check_preemption_disabled+0xec/0xf0 [ 31.745213] [ T2289] debug_smp_processor_id+0x28/0x38 [ 31.745216] [ T2289] meson_ddr_pmu_create+0x200/0x560 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd] [ 31.745237] [ T2289] g12_ddr_pmu_probe+0x20/0x38 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd] [ 31.745246] [ T2289] platform_probe+0x98/0xe0 [ 31.745254] [ T2289] really_probe+0x144/0x3f8 [ 31.745258] [ T2289] __driver_probe_device+0xb8/0x180 [ 31.745261] [ T2289] driver_probe_device+0x54/0x268 [ 31.745264] [ T2289] __driver_attach+0x11c/0x288 [ 31.745267] [ T2289] bus_for_each_dev+0xfc/0x160 [ 31.745274] [ T2289] driver_attach+0x34/0x50 [ 31.745277] [ T2289] bus_add_driver+0x160/0x2b0 [ 31.745281] [ T2289] driver_register+0x78/0x120 [ 31.745285] [ T2289] __platform_driver_register+0x30/0x48 [ 31.745288] [ T2289] init_module+0x30/0xfe0 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd] [ 31.745298] [ T2289] do_one_initcall+0x11c/0x438 [ 31.745303] [ T2289] do_init_module+0x68/0x228 [ 31.745311] [ T2289] load_module+0x118c/0x13a8 [ 31.745315] [ T2289] __arm64_sys_finit_module+0x274/0x390 [ 31.745320] [ T2289] invoke_syscall+0x74/0x108 [ 31.745326] [ T2289] el0_svc_common+0x90/0xf8 [ 31.745330] [ T2289] do_el0_svc+0x2c/0x48 [ 31.745333] [ T2289] el0_svc+0x60/0x150 [ 31.745337] [ T2289] el0t_64_sync_handler+0x80/0x118 [ 31.745341] [ T2289] el0t_64_sync+0x1b8/0x1c0 Changes replaces smp_processor_id() with raw_smp_processor_id() to ensure safe CPU ID retrieval in preemptible contexts. Cc: Jiucheng Xu <jiucheng.xu@amlogic.com> Fixes: 2016e2113d35 ("perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver") Signed-off-by: Anand Moon <linux.amoon@gmail.com> Link: https://lore.kernel.org/r/20250407063206.5211-1-linux.amoon@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-09perf/arm-cmn: Fix REQ2/SNP2 mixupRobin Murphy
Somehow the encodings for REQ2/SNP2 channels in XP events got mixed up... Unmix them. CC: stable@vger.kernel.org Fixes: 23760a014417 ("perf/arm-cmn: Add CMN-700 support") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/087023e9737ac93d7ec7a841da904758c254cb01.1746717400.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-04-17perf: Do not enable by default during compile testingKrzysztof Kozlowski
Enabling the compile test should not cause automatic enabling of all drivers, but only allow to choose to compile them. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250417074650.81561-1-krzysztof.kozlowski@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2025-04-17perf: arm-ni: Fix missing platform_set_drvdata()Hongbo Yao
Add missing platform_set_drvdata in arm_ni_probe(), otherwise calling platform_get_drvdata() in remove returns NULL. Fixes: 4d5a7680f2b4 ("perf: Add driver for Arm NI-700 interconnect PMU") Signed-off-by: Hongbo Yao <andy.xu@hj-micro.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20250401054248.3985814-1-andy.xu@hj-micro.com Signed-off-by: Will Deacon <will@kernel.org>