summaryrefslogtreecommitdiff
path: root/drivers/perf/riscv_pmu_sbi.c
AgeCommit message (Collapse)Author
2025-01-28treewide: const qualify ctl_tables where applicableJoel Granados
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-01-09drivers/perf: riscv: Do not allow invalid raw event configAtish Patra
The SBI specification allows only lower 48bits of hpmeventX to be configured via SBI PMU. Currently, the driver masks of the higher bits but doesn't return an error. This will lead to an additional SBI call for config matching which should return for an invalid event error in most of the cases. However, if a platform(i.e Rocket and sifive cores) implements a bitmap of all bits in the event encoding this will lead to an incorrect event being programmed leading to user confusion. Report the error to the user if higher bits are set during the event mapping itself to avoid the confusion and save an additional SBI call. Suggested-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-3-813e8a4f5962@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-09drivers/perf: riscv: Return error for default caseAtish Patra
If the upper two bits has an invalid valid (0x1), the event mapping is not reliable as it returns an uninitialized variable. Return appropriate value for the default case. Fixes: f0c9363db2dd ("perf/riscv-sbi: Add platform specific firmware event handling") Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-2-813e8a4f5962@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-09drivers/perf: riscv: Fix Platform firmware event dataAtish Patra
Platform firmware event data field is allowed to be 62 bits for Linux as uppper most two bits are reserved to indicate SBI fw or platform specific firmware events. However, the event data field is masked as per the hardware raw event mask which is not correct. Fix the platform firmware event data field with proper mask. Fixes: f0c9363db2dd ("perf/riscv-sbi: Add platform specific firmware event handling") Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-1-813e8a4f5962@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-11-12drivers: perf: Fix wrong put_cpu() placementAlexandre Ghiti
Unfortunately, the wrong patch version was merged which places the put_cpu() after enabling a static key, which is not safe as pointed by Will [1], so move put_cpu() before to avoid this. Fixes: 2840dadf0dde ("drivers: perf: Fix smp_processor_id() use in preemptible code") Reported-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/all/20240827125335.GD4772@willie-the-truck/ [1] Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20241112113422.617954-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-10-01drivers/perf: riscv: Align errno for unsupported perf eventPu Lehui
RISC-V perf driver does not yet support PERF_TYPE_BREAKPOINT. It would be more appropriate to return -EOPNOTSUPP or -ENOENT for this type in pmu_sbi_event_map. Considering that other implementations return -ENOENT for unsupported perf types, let's synchronize this behavior. Due to this reason, a riscv bpf testcases perf_skip fail. Meanwhile, align that behavior to the rest of proper place. Signed-off-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Fixes: 9b3e150e310e ("RISC-V: Add a simple platform driver for RISC-V legacy perf") Fixes: 16d3b1af0944 ("perf: RISC-V: Check standard event availability") Fixes: e9991434596f ("RISC-V: Add perf platform driver based on SBI PMU extension") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240831071520.1630360-1-pulehui@huaweicloud.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-24Merge tag 'riscv-for-linus-6.12-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support using Zkr to seed KASLR - Support IPI-triggered CPU backtracing - Support for generic CPU vulnerabilities reporting to userspace - A few cleanups for missing licenses - The size limit on the XIP kernel has been removed - Support for tracing userspace stacks - Support for the Svvptc extension - Various cleanups and fixes throughout the tree * tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (47 commits) crash: Fix riscv64 crash memory reserve dead loop perf/riscv-sbi: Add platform specific firmware event handling tools: Optimize ring buffer for riscv tools: Add riscv barrier implementation RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_t ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE riscv: Enable bitops instrumentation riscv: Omit optimized string routines when using KASAN ACPI: RISCV: Make acpi_numa_get_nid() to be static riscv: Randomize lower bits of stack address selftests: riscv: Allow mmap test to compile on 32-bit riscv: Make riscv_isa_vendor_ext_andes array static riscv: Use LIST_HEAD() to simplify code riscv: defconfig: Disable RZ/Five peripheral support RISC-V: Implement kgdb_roundup_cpus() to enable future NMI Roundup riscv: avoid Imbalance in RAS riscv: cacheinfo: Add back init_cache_level() function riscv: Remove unused _TIF_WORK_MASK drivers/perf: riscv: Remove redundant macro check riscv: define ILLEGAL_POINTER_VALUE for 64bit ...
2024-09-20perf/riscv-sbi: Add platform specific firmware event handlingMayuresh Chitale
The SBI v2.0 specification pointed to by the link below reserves the event code 0xffff for platform specific firmware events. Update the driver to be able to parse and program such events. The platform specific firmware events must now be specified in the perf command as below: perf stat -e rCxxx ... where bits[63:62] = 0x3 of the event config indicate a platform specific firmware event and xxx indicate the actual event code which is passed as the event data. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/v2.0/riscv-sbi.pdf Link: https://lore.kernel.org/r/20240812051109.6496-1-mchitale@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-10drivers: perf: Fix smp_processor_id() use in preemptible codeAlexandre Ghiti
As reported in [1], the use of smp_processor_id() in pmu_sbi_device_probe() must be protected by disabling the preemption, so simple use get_cpu()/put_cpu() instead. Reported-by: Nam Cao <namcao@linutronix.de> Closes: https://lore.kernel.org/linux-riscv/20240820074925.ReMKUPP3@linutronix.de/ [1] Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Tested-by: Nam Cao <namcao@linutronix.de> Fixes: a8625217a054 ("drivers/perf: riscv: Implement SBI PMU snapshot function") Reported-by: Andrea Parri <parri.andrea@gmail.com> Tested-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20240826165210.124696-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-01perf: riscv: Fix selecting counters in legacy modeShifrin Dmitry
It is required to check event type before checking event config. Events with the different types can have the same config. This check is missed for legacy mode code For such perf usage: sysctl -w kernel.perf_user_access=2 perf stat -e cycles,L1-dcache-loads -- driver will try to force both events to CYCLE counter. This commit implements event type check before forcing events on the special counters. Signed-off-by: Shifrin Dmitry <dmitry.shifrin@syntacore.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Fixes: cc4c07c89aad ("drivers: perf: Implement perf event mmap support in the SBI backend") Link: https://lore.kernel.org/r/20240729125858.630653-1-dmitry.shifrin@syntacore.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-27Merge tag 'riscv-for-linus-6.11-mw2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for NUMA (via SRAT and SLIT), console output (via SPCR), and cache info (via PPTT) on ACPI-based systems. - The trap entry/exit code no longer breaks the return address stack predictor on many systems, which results in an improvement to trap latency. - Support for HAVE_ARCH_STACKLEAK. - The sv39 linear map has been extended to support 128GiB mappings. - The frequency of the mtime CSR is now visible via hwprobe. * tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits) RISC-V: Provide the frequency of time CSR via hwprobe riscv: Extend sv39 linear mapping max size to 128G riscv: enable HAVE_ARCH_STACKLEAK riscv: signal: Remove unlikely() from WARN_ON() condition riscv: Improve exception and system call latency RISC-V: Select ACPI PPTT drivers riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init() RISC-V: ACPI: Enable SPCR table for console output on RISC-V riscv: boot: remove duplicated targets line trace: riscv: Remove deprecated kprobe on ftrace support riscv: cpufeature: Extract common elements from extension checking riscv: Introduce vendor variants of extension helpers riscv: Add vendor extensions to /proc/cpuinfo riscv: Extend cpufeature.c to detect vendor extensions RISC-V: run savedefconfig for defconfig RISC-V: hwprobe: sort EXT_KEY()s in hwprobe_isa_ext0() alphabetically ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_init ACPI: NUMA: change the ACPI_NUMA to a hidden option ACPI: NUMA: Add handler for SRAT RINTC affinity structure ...
2024-07-24sysctl: treewide: constify the ctl_table argument of proc_handlersJoel Granados
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *); @r4@ identifier func, ctl; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos); ``` * Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Co-developed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-07-22Merge patch series "riscv: Separate vendor extensions from standard extensions"Palmer Dabbelt
Charlie Jenkins <charlie@rivosinc.com> says: All extensions, both standard and vendor, live in one struct "riscv_isa_ext". There is currently one vendor extension, xandespmu, but it is likely that more vendor extensions will be added to the kernel in the future. As more vendor extensions (and standard extensions) are added, riscv_isa_ext will become more bloated with a mix of vendor and standard extensions. This also allows each vendor to be conditionally enabled through Kconfig. * b4-shazam-merge: riscv: cpufeature: Extract common elements from extension checking riscv: Introduce vendor variants of extension helpers riscv: Add vendor extensions to /proc/cpuinfo riscv: Extend cpufeature.c to detect vendor extensions Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-0-0af7587bbec0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22riscv: Introduce vendor variants of extension helpersCharlie Jenkins
Vendor extensions are maintained in per-vendor structs (separate from standard extensions which live in riscv_isa). Create vendor variants for the existing extension helpers to interface with the riscv_isa_vendor bitmaps. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-3-0af7587bbec0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22riscv: Extend cpufeature.c to detect vendor extensionsCharlie Jenkins
Instead of grouping all vendor extensions into the same riscv_isa_ext that standard instructions use, create a struct "riscv_isa_vendor_ext_data_list" that allows each vendor to maintain their vendor extensions independently of the standard extensions. xandespmu is currently the only vendor extension so that is the only extension that is affected by this change. An additional benefit of this is that the extensions of each vendor can be conditionally enabled. A config RISCV_ISA_VENDOR_EXT_ANDES has been added to allow for that. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Yu Chien Peter Lin <peterlin@andestech.com> Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-1-0af7587bbec0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03perf: RISC-V: Check standard event availabilitySamuel Holland
The RISC-V SBI PMU specification defines several standard hardware and cache events. Currently, all of these events are exposed to userspace, even when not actually implemented. They appear in the `perf list` output, and commands like `perf stat` try to use them. This is more than just a cosmetic issue, because the PMU driver's .add function fails for these events, which causes pmu_groups_sched_in() to prematurely stop scheduling in other (possibly valid) hardware events. Add logic to check which events are supported by the hardware (i.e. can be mapped to some counter), so only usable events are reported to userspace. Since the kernel does not know the mapping between events and possible counters, this check must happen during boot, when no counters are in use. Make the check asynchronous to minimize impact on boot time. Fixes: e9991434596f ("RISC-V: Add perf platform driver based on SBI PMU extension") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Tested-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-3-e01cfddcf035@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpusSamuel Holland
Currently, we stop all the counters while a new cpu is brought online. However, the hpmevent to counter mappings are not reset. The firmware may have some stale encoding in their mapping structure which may lead to undesirable results. We have not encountered such scenario though. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-2-e01cfddcf035@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - Move a lot of state that was previously stored on a per vcpu basis into a per-CPU area, because it is only pertinent to the host while the vcpu is loaded. This results in better state tracking, and a smaller vcpu structure. - Add full handling of the ERET/ERETAA/ERETAB instructions in nested virtualisation. The last two instructions also require emulating part of the pointer authentication extension. As a result, the trap handling of pointer authentication has been greatly simplified. - Turn the global (and not very scalable) LPI translation cache into a per-ITS, scalable cache, making non directly injected LPIs much cheaper to make visible to the vcpu. - A batch of pKVM patches, mostly fixes and cleanups, as the upstreaming process seems to be resuming. Fingers crossed! - Allocate PPIs and SGIs outside of the vcpu structure, allowing for smaller EL2 mapping and some flexibility in implementing more or less than 32 private IRQs. - Purge stale mpidr_data if a vcpu is created after the MPIDR map has been created. - Preserve vcpu-specific ID registers across a vcpu reset. - Various minor cleanups and improvements. LoongArch: - Add ParaVirt IPI support - Add software breakpoint support - Add mmio trace events support RISC-V: - Support guest breakpoints using ebreak - Introduce per-VCPU mp_state_lock and reset_cntx_lock - Virtualize SBI PMU snapshot and counter overflow interrupts - New selftests for SBI PMU and Guest ebreak - Some preparatory work for both TDX and SNP page fault handling. This also cleans up the page fault path, so that the priorities of various kinds of fauls (private page, no memory, write to read-only slot, etc.) are easier to follow. x86: - Minimize amount of time that shadow PTEs remain in the special REMOVED_SPTE state. This is a state where the mmu_lock is held for reading but concurrent accesses to the PTE have to spin; shortening its use allows other vCPUs to repopulate the zapped region while the zapper finishes tearing down the old, defunct page tables. - Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID field, which is defined by hardware but left for software use. This lets KVM communicate its inability to map GPAs that set bits 51:48 on hosts without 5-level nested page tables. Guest firmware is expected to use the information when mapping BARs; this avoids that they end up at a legal, but unmappable, GPA. - Fixed a bug where KVM would not reject accesses to MSR that aren't supposed to exist given the vCPU model and/or KVM configuration. - As usual, a bunch of code cleanups. x86 (AMD): - Implement a new and improved API to initialize SEV and SEV-ES VMs, which will also be extendable to SEV-SNP. The new API specifies the desired encryption in KVM_CREATE_VM and then separately initializes the VM. The new API also allows customizing the desired set of VMSA features; the features affect the measurement of the VM's initial state, and therefore enabling them cannot be done tout court by the hypervisor. While at it, the new API includes two bugfixes that couldn't be applied to the old one without a flag day in userspace or without affecting the initial measurement. When a SEV-ES VM is created with the new VM type, KVM_GET_REGS/KVM_SET_REGS and friends are rejected once the VMSA has been encrypted. Also, the FPU and AVX state will be synchronized and encrypted too. - Support for GHCB version 2 as applicable to SEV-ES guests. This, once more, is only accessible when using the new KVM_SEV_INIT2 flow for initialization of SEV-ES VMs. x86 (Intel): - An initial bunch of prerequisite patches for Intel TDX were merged. They generally don't do anything interesting. The only somewhat user visible change is a new debugging mode that checks that KVM's MMU never triggers a #VE virtualization exception in the guest. - Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig VM-Exit to L1, as per the SDM. Generic: - Use vfree() instead of kvfree() for allocations that always use vcalloc() or __vcalloc(). - Remove .change_pte() MMU notifier - the changes to non-KVM code are small and Andrew Morton asked that I also take those through the KVM tree. The callback was only ever implemented by KVM (which was also the original user of MMU notifiers) but it had been nonfunctional ever since calls to set_pte_at_notify were wrapped with invalidate_range_start and invalidate_range_end... in 2012. Selftests: - Enhance the demand paging test to allow for better reporting and stressing of UFFD performance. - Convert the steal time test to generate TAP-friendly output. - Fix a flaky false positive in the xen_shinfo_test due to comparing elapsed time across two different clock domains. - Skip the MONITOR/MWAIT test if the host doesn't actually support MWAIT. - Avoid unnecessary use of "sudo" in the NX hugepage test wrapper shell script, to play nice with running in a minimal userspace environment. - Allow skipping the RSEQ test's sanity check that the vCPU was able to complete a reasonable number of KVM_RUNs, as the assert can fail on a completely valid setup. If the test is run on a large-ish system that is otherwise idle, and the test isn't affined to a low-ish number of CPUs, the vCPU task can be repeatedly migrated to CPUs that are in deep sleep states, which results in the vCPU having very little net runtime before the next migration due to high wakeup latencies. - Define _GNU_SOURCE for all selftests to fix a warning that was introduced by a change to kselftest_harness.h late in the 6.9 cycle, and because forcing every test to #define _GNU_SOURCE is painful. - Provide a global pseudo-RNG instance for all tests, so that library code can generate random, but determinstic numbers. - Use the global pRNG to randomly force emulation of select writes from guest code on x86, e.g. to help validate KVM's emulation of locked accesses. - Allocate and initialize x86's GDT, IDT, TSS, segments, and default exception handlers at VM creation, instead of forcing tests to manually trigger the related setup. Documentation: - Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (225 commits) selftests/kvm: remove dead file KVM: selftests: arm64: Test vCPU-scoped feature ID registers KVM: selftests: arm64: Test that feature ID regs survive a reset KVM: selftests: arm64: Store expected register value in set_id_regs KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope KVM: arm64: Only reset vCPU-scoped feature ID regs once KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs() KVM: arm64: Rename is_id_reg() to imply VM scope KVM: arm64: Destroy mpidr_data for 'late' vCPU creation KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support KVM: arm64: Fix hvhe/nvhe early alias parsing KVM: SEV: Allow per-guest configuration of GHCB protocol version KVM: SEV: Add GHCB handling for termination requests KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests KVM: SEV: Add support to handle AP reset MSR protocol KVM: x86: Explicitly zero kvm_caps during vendor module load KVM: x86: Fully re-initialize supported_mce_cap on vendor module load KVM: x86: Fully re-initialize supported_vm_types on vendor module load KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values ...
2024-04-26drivers/perf: riscv: Implement SBI PMU snapshot functionAtish Patra
SBI v2.0 SBI introduced PMU snapshot feature which adds the following features. 1. Read counter values directly from the shared memory instead of csr read. 2. Start multiple counters with initial values with one SBI call. These functionalities optimizes the number of traps to the higher privilege mode. If the kernel is in VS mode while the hypervisor deploy trap & emulate method, this would minimize all the hpmcounter CSR read traps. If the kernel is running in S-mode, the benefits reduced to CSR latency vs DRAM/cache latency as there is no trap involved while accessing the hpmcounter CSRs. In both modes, it does saves the number of ecalls while starting multiple counter together with an initial values. This is a likely scenario if multiple counters overflow at the same time. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-10-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22drivers/perf: riscv: Fix counter mask iteration for RV32Atish Patra
For RV32, used_hw_ctrs can have more than 1 word if the firmware chooses to interleave firmware/hardware counters indicies. Even though it's a unlikely scenario, handle that case by iterating over all the words instead of just using the first word. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-9-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22drivers/perf: riscv: Use BIT macro for shifting operationsAtish Patra
It is a good practice to use BIT() instead of (1 << x). Replace the current usages with BIT(). Take this opportunity to replace few (1UL << x) with BIT() as well for consistency. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-5-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22drivers/perf: riscv: Read upper bits of a firmware counterAtish Patra
SBI v2.0 introduced a explicit function to read the upper 32 bits for any firmware counter width that is longer than 32bits. This is only applicable for RV32 where firmware counter can be 64 bit. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-4-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22RISC-V: Fix the typo in Scountovf CSR nameAtish Patra
The counter overflow CSR name is "scountovf" not "sscountovf". Fix the csr name. Fixes: 4905ec2fb7e6 ("RISC-V: Add sscofpmf extension support") Reviewed-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-2-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-19perf/riscv: Assign parents for event_source devicesJonathan Cameron
Currently all these devices appear directly under /sys/devices/ Only root busses should appear there, so instead assign the pmu->dev parents to be the appropriate platform devices. Link: https://lore.kernel.org/linux-cxl/ZCLI9A40PJsyqAmq@kroah.com/ Cc: Atish Patra <atishp@atishpatra.org> CC: Anup Patel <anup@brainfault.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240412161057.14099-13-Jonathan.Cameron@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09drivers: perf: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel from sbi_pmu_sysctl_table Signed-off-by: Joel Granados <j.granados@samsung.com> Link: https://lore.kernel.org/r/20240328-jag-sysctl_remset_misc-v1-7-47c1463b3af2@samsung.com Signed-off-by: Will Deacon <will@kernel.org>
2024-03-22Merge tag 'riscv-for-linus-6.9-mw2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for various vector-accelerated crypto routines - Hibernation is now enabled for portable kernel builds - mmap_rnd_bits_max is larger on systems with larger VAs - Support for fast GUP - Support for membarrier-based instruction cache synchronization - Support for the Andes hart-level interrupt controller and PMU - Some cleanups around unaligned access speed probing and Kconfig settings - Support for ACPI LPI and CPPC - Various cleanus related to barriers - A handful of fixes * tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (66 commits) riscv: Fix syscall wrapper for >word-size arguments crypto: riscv - add vector crypto accelerated AES-CBC-CTS crypto: riscv - parallelize AES-CBC decryption riscv: Only flush the mm icache when setting an exec pte riscv: Use kcalloc() instead of kzalloc() riscv/barrier: Add missing space after ',' riscv/barrier: Consolidate fence definitions riscv/barrier: Define RISCV_FULL_BARRIER riscv/barrier: Define __{mb,rmb,wmb} RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ cpufreq: Move CPPC configs to common Kconfig and add RISC-V ACPI: RISC-V: Add CPPC driver ACPI: Enable ACPI_PROCESSOR for RISC-V ACPI: RISC-V: Add LPI driver cpuidle: RISC-V: Move few functions to arch/riscv riscv: Introduce set_compat_task() in asm/compat.h riscv: Introduce is_compat_thread() into compat.h riscv: add compile-time test into is_compat_task() riscv: Replace direct thread flag check with is_compat_task() riscv: Improve arch_get_mmap_end() macro ...
2024-03-12perf: RISC-V: Introduce Andes PMU to support perf event samplingYu Chien Peter Lin
Assign riscv_pmu_irq_num the value of (256 + 18) for the custome PMU and add SSCOUNTOVF and SIP alternatives to ALT_SBI_PMU_OVERFLOW() and ALT_SBI_PMU_OVF_CLEAR_PENDING() macros, respectively. To make use of Andes PMU extension, "xandespmu" needs to be appended to the riscv,isa-extensions for each cpu node in device-tree, and make sure CONFIG_ANDES_CUSTOM_PMU is enabled. Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Charles Ci-Jyun Wu <dminus@andestech.com> Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com> Co-developed-by: Locus Wei-Han Chen <locus84@andestech.com> Signed-off-by: Locus Wei-Han Chen <locus84@andestech.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20240222083946.3977135-8-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-12perf: RISC-V: Eliminate redundant interrupt enable/disable operationsYu Chien Peter Lin
The interrupt enable/disable operations are already performed by the IRQ chip functions riscv_intc_irq_unmask()/riscv_intc_irq_mask() during enable_percpu_irq()/disable_percpu_irq(). It can be done only once. Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240222083946.3977135-7-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-02-29perf: RISCV: Fix panic on pmu overflow handlerFei Wu
(1 << idx) of int is not desired when setting bits in unsigned long overflowed_ctrs, use BIT() instead. This panic happens when running 'perf record -e branches' on sophgo sg2042. [ 273.311852] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000098 [ 273.320851] Oops [#1] [ 273.323179] Modules linked in: [ 273.326303] CPU: 0 PID: 1475 Comm: perf Not tainted 6.6.0-rc3+ #9 [ 273.332521] Hardware name: Sophgo Mango (DT) [ 273.336878] epc : riscv_pmu_ctr_get_width_mask+0x8/0x62 [ 273.342291] ra : pmu_sbi_ovf_handler+0x2e0/0x34e [ 273.347091] epc : ffffffff80aecd98 ra : ffffffff80aee056 sp : fffffff6e36928b0 [ 273.354454] gp : ffffffff821f82d0 tp : ffffffd90c353200 t0 : 0000002ade4f9978 [ 273.361815] t1 : 0000000000504d55 t2 : ffffffff8016cd8c s0 : fffffff6e3692a70 [ 273.369180] s1 : 0000000000000020 a0 : 0000000000000000 a1 : 00001a8e81800000 [ 273.376540] a2 : 0000003c00070198 a3 : 0000003c00db75a4 a4 : 0000000000000015 [ 273.383901] a5 : ffffffd7ff8804b0 a6 : 0000000000000015 a7 : 000000000000002a [ 273.391327] s2 : 000000000000ffff s3 : 0000000000000000 s4 : ffffffd7ff8803b0 [ 273.398773] s5 : 0000000000504d55 s6 : ffffffd905069800 s7 : ffffffff821fe210 [ 273.406139] s8 : 000000007fffffff s9 : ffffffd7ff8803b0 s10: ffffffd903f29098 [ 273.413660] s11: 0000000080000000 t3 : 0000000000000003 t4 : ffffffff8017a0ca [ 273.421022] t5 : ffffffff8023cfc2 t6 : ffffffd9040780e8 [ 273.426437] status: 0000000200000100 badaddr: 0000000000000098 cause: 000000000000000d [ 273.434512] [<ffffffff80aecd98>] riscv_pmu_ctr_get_width_mask+0x8/0x62 [ 273.441169] [<ffffffff80076bd8>] handle_percpu_devid_irq+0x98/0x1ee [ 273.447562] [<ffffffff80071158>] generic_handle_domain_irq+0x28/0x36 [ 273.454151] [<ffffffff8047a99a>] riscv_intc_irq+0x36/0x4e [ 273.459659] [<ffffffff80c944de>] handle_riscv_irq+0x4a/0x74 [ 273.465442] [<ffffffff80c94c48>] do_irq+0x62/0x92 [ 273.470360] Code: 0420 60a2 6402 5529 0141 8082 0013 0000 0013 0000 (6d5c) b783 [ 273.477921] ---[ end trace 0000000000000000 ]--- [ 273.482630] Kernel panic - not syncing: Fatal exception in interrupt Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Fei Wu <fei2.wu@intel.com> Link: https://lore.kernel.org/r/20240228115425.2613856-1-fei2.wu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-09riscv: Rearrange hwcap.h and cpufeature.hXiao Wang
Now hwcap.h and cpufeature.h are mutually including each other, and most of the variable/API declarations in hwcap.h are implemented in cpufeature.c, so, it's better to move them into cpufeature.h and leave only macros for ISA extension logical IDs in hwcap.h. BTW, the riscv_isa_extension_mask macro is not used now, so this patch removes it. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231031064553.2319688-2-xiao.w.wang@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-09Merge patch "drivers: perf: Do not broadcast to other cpus when starting a ↵Palmer Dabbelt
counter" This is really just a single patch, but since the offending fix hasn't yet made it to my for-next I'm merging it here. Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-09drivers: perf: Do not broadcast to other cpus when starting a counterAlexandre Ghiti
This command: $ perf record -e cycles:k -e instructions:k -c 10000 -m 64M dd if=/dev/zero of=/dev/null count=1000 gives rise to this kernel warning: [ 444.364395] WARNING: CPU: 0 PID: 104 at kernel/smp.c:775 smp_call_function_many_cond+0x42c/0x436 [ 444.364515] Modules linked in: [ 444.364657] CPU: 0 PID: 104 Comm: perf-exec Not tainted 6.6.0-rc6-00051-g391df82e8ec3-dirty #73 [ 444.364771] Hardware name: riscv-virtio,qemu (DT) [ 444.364868] epc : smp_call_function_many_cond+0x42c/0x436 [ 444.364917] ra : on_each_cpu_cond_mask+0x20/0x32 [ 444.364948] epc : ffffffff8009f9e0 ra : ffffffff8009fa5a sp : ff20000000003800 [ 444.364966] gp : ffffffff81500aa0 tp : ff60000002b83000 t0 : ff200000000038c0 [ 444.364982] t1 : ffffffff815021f0 t2 : 000000000000001f s0 : ff200000000038b0 [ 444.364998] s1 : ff60000002c54d98 a0 : ff60000002a73940 a1 : 0000000000000000 [ 444.365013] a2 : 0000000000000000 a3 : 0000000000000003 a4 : 0000000000000100 [ 444.365029] a5 : 0000000000010100 a6 : 0000000000f00000 a7 : 0000000000000000 [ 444.365044] s2 : 0000000000000000 s3 : ffffffffffffffff s4 : ff60000002c54d98 [ 444.365060] s5 : ffffffff81539610 s6 : ffffffff80c20c48 s7 : 0000000000000000 [ 444.365075] s8 : 0000000000000000 s9 : 0000000000000001 s10: 0000000000000001 [ 444.365090] s11: ffffffff80099394 t3 : 0000000000000003 t4 : 00000000eac0c6e6 [ 444.365104] t5 : 0000000400000000 t6 : ff60000002e010d0 [ 444.365120] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 444.365226] [<ffffffff8009f9e0>] smp_call_function_many_cond+0x42c/0x436 [ 444.365295] [<ffffffff8009fa5a>] on_each_cpu_cond_mask+0x20/0x32 [ 444.365311] [<ffffffff806e90dc>] pmu_sbi_ctr_start+0x7a/0xaa [ 444.365327] [<ffffffff806e880c>] riscv_pmu_start+0x48/0x66 [ 444.365339] [<ffffffff8012111a>] perf_adjust_freq_unthr_context+0x196/0x1ac [ 444.365356] [<ffffffff801237aa>] perf_event_task_tick+0x78/0x8c [ 444.365368] [<ffffffff8003faf4>] scheduler_tick+0xe6/0x25e [ 444.365383] [<ffffffff8008a042>] update_process_times+0x80/0x96 [ 444.365398] [<ffffffff800991ec>] tick_sched_handle+0x26/0x52 [ 444.365410] [<ffffffff800993e4>] tick_sched_timer+0x50/0x98 [ 444.365422] [<ffffffff8008a6aa>] __hrtimer_run_queues+0x126/0x18a [ 444.365433] [<ffffffff8008b350>] hrtimer_interrupt+0xce/0x1da [ 444.365444] [<ffffffff806cdc60>] riscv_timer_interrupt+0x30/0x3a [ 444.365457] [<ffffffff8006afa6>] handle_percpu_devid_irq+0x80/0x114 [ 444.365470] [<ffffffff80065b82>] generic_handle_domain_irq+0x1c/0x2a [ 444.365483] [<ffffffff8045faec>] riscv_intc_irq+0x2e/0x46 [ 444.365497] [<ffffffff808a9c62>] handle_riscv_irq+0x4a/0x74 [ 444.365521] [<ffffffff808aa760>] do_irq+0x7c/0x7e [ 444.365796] ---[ end trace 0000000000000000 ]--- That's because the fix in commit 3fec323339a4 ("drivers: perf: Fix panic in riscv SBI mmap support") was wrong since there is no need to broadcast to other cpus when starting a counter, that's only needed in mmap when the counters could have already been started on other cpus, so simply remove this broadcast. Fixes: 3fec323339a4 ("drivers: perf: Fix panic in riscv SBI mmap support") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Clément Léger <cleger@rivosinc.com> Tested-by: Yu Chien Peter Lin <peterlin@andestech.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> #On Link: https://lore.kernel.org/r/20231026084010.11888-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-09drivers: perf: Check find_first_bit() return valueAlexandre Ghiti
We must check the return value of find_first_bit() before using the return value as an index array since it happens to overflow the array and then panic: [ 107.318430] Kernel BUG [#1] [ 107.319434] CPU: 3 PID: 1238 Comm: kill Tainted: G E 6.6.0-rc6ubuntu-defconfig #2 [ 107.319465] Hardware name: riscv-virtio,qemu (DT) [ 107.319551] epc : pmu_sbi_ovf_handler+0x3a4/0x3ae [ 107.319840] ra : pmu_sbi_ovf_handler+0x52/0x3ae [ 107.319868] epc : ffffffff80a0a77c ra : ffffffff80a0a42a sp : ffffaf83fecda350 [ 107.319884] gp : ffffffff823961a8 tp : ffffaf8083db1dc0 t0 : ffffaf83fecda480 [ 107.319899] t1 : ffffffff80cafe62 t2 : 000000000000ff00 s0 : ffffaf83fecda520 [ 107.319921] s1 : ffffaf83fecda380 a0 : 00000018fca29df0 a1 : ffffffffffffffff [ 107.319936] a2 : 0000000001073734 a3 : 0000000000000004 a4 : 0000000000000000 [ 107.319951] a5 : 0000000000000040 a6 : 000000001d1c8774 a7 : 0000000000504d55 [ 107.319965] s2 : ffffffff82451f10 s3 : ffffffff82724e70 s4 : 000000000000003f [ 107.319980] s5 : 0000000000000011 s6 : ffffaf8083db27c0 s7 : 0000000000000000 [ 107.319995] s8 : 0000000000000001 s9 : 00007fffb45d6558 s10: 00007fffb45d81a0 [ 107.320009] s11: ffffaf7ffff60000 t3 : 0000000000000004 t4 : 0000000000000000 [ 107.320023] t5 : ffffaf7f80000000 t6 : ffffaf8000000000 [ 107.320037] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 107.320081] [<ffffffff80a0a77c>] pmu_sbi_ovf_handler+0x3a4/0x3ae [ 107.320112] [<ffffffff800b42d0>] handle_percpu_devid_irq+0x9e/0x1a0 [ 107.320131] [<ffffffff800ad92c>] generic_handle_domain_irq+0x28/0x36 [ 107.320148] [<ffffffff8065f9f8>] riscv_intc_irq+0x36/0x4e [ 107.320166] [<ffffffff80caf4a0>] handle_riscv_irq+0x54/0x86 [ 107.320189] [<ffffffff80cb0036>] do_irq+0x64/0x96 [ 107.320271] Code: 85a6 855e b097 ff7f 80e7 9220 b709 9002 4501 bbd9 (9002) 6097 [ 107.320585] ---[ end trace 0000000000000000 ]--- [ 107.320704] Kernel panic - not syncing: Fatal exception in interrupt [ 107.320775] SMP: stopping secondary CPUs [ 107.321219] Kernel Offset: 0x0 from 0xffffffff80000000 [ 107.333051] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: 4905ec2fb7e6 ("RISC-V: Add sscofpmf extension support") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231109082128.40777-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12drivers: perf: Fix panic in riscv SBI mmap supportAlexandre Ghiti
The following panic can happen when mmap is called before the pmu add callback which sets the hardware counter index: this happens for example with the following command `perf record --no-bpf-event -n kill`. [ 99.461486] CPU: 1 PID: 1259 Comm: perf Tainted: G E 6.6.0-rc4ubuntu-defconfig #2 [ 99.461669] Hardware name: riscv-virtio,qemu (DT) [ 99.461748] epc : pmu_sbi_set_scounteren+0x42/0x44 [ 99.462337] ra : smp_call_function_many_cond+0x126/0x5b0 [ 99.462369] epc : ffffffff809f9d24 ra : ffffffff800f93e0 sp : ff60000082153aa0 [ 99.462407] gp : ffffffff82395c98 tp : ff6000009a218040 t0 : ff6000009ab3a4f0 [ 99.462425] t1 : 0000000000000004 t2 : 0000000000000100 s0 : ff60000082153ab0 [ 99.462459] s1 : 0000000000000000 a0 : ff60000098869528 a1 : 0000000000000000 [ 99.462473] a2 : 000000000000001f a3 : 0000000000f00000 a4 : fffffffffffffff8 [ 99.462488] a5 : 00000000000000cc a6 : 0000000000000000 a7 : 0000000000735049 [ 99.462502] s2 : 0000000000000001 s3 : ffffffff809f9ce2 s4 : ff60000098869528 [ 99.462516] s5 : 0000000000000002 s6 : 0000000000000004 s7 : 0000000000000001 [ 99.462530] s8 : ff600003fec98bc0 s9 : ffffffff826c5890 s10: ff600003fecfcde0 [ 99.462544] s11: ff600003fec98bc0 t3 : ffffffff819e2558 t4 : ff1c000004623840 [ 99.462557] t5 : 0000000000000901 t6 : ff6000008feeb890 [ 99.462570] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 99.462658] [<ffffffff809f9d24>] pmu_sbi_set_scounteren+0x42/0x44 [ 99.462979] Code: 1060 4785 97bb 00d7 8fd9 9073 1067 6422 0141 8082 (9002) 0013 [ 99.463335] Kernel BUG [#2] To circumvent this, try to enable userspace access to the hardware counter when it is selected in addition to when the event is mapped. And vice-versa when the event is stopped/unmapped. Fixes: cc4c07c89aad ("drivers: perf: Implement perf event mmap support in the SBI backend") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231006082010.11963-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16drivers: perf: Implement perf event mmap support in the SBI backendAlexandre Ghiti
We used to unconditionnally expose the cycle and instret csrs to userspace, which gives rise to security concerns. So now we only allow access to hw counters from userspace through the perf framework which will handle context switches, per-task events...etc. A sysctl allows to revert the behaviour to the legacy mode so that userspace applications which are not ready for this change do not break. But the default value is to allow userspace only through perf: this will break userspace applications which rely on direct access to rdcycle. This choice was made for security reasons [1][2]: most of the applications which use rdcycle can instead use rdtime to count the elapsed time. [1] https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE?pli=1 [2] https://www.youtube.com/watch?v=3-c4C_L2PRQ&ab_channel=IEEESymposiumonSecurityandPrivacy Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
2023-08-16drivers: perf: Rename riscv pmu sbi driverAlexandre Ghiti
That's just cosmetic, no functional changes. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com>
2023-06-20perf: RISC-V: Limit the number of counters returned from SBIViacheslav Mitrofanov
Perf gets the number of supported counters from SBI. If it happens that the number of returned counters more than RISCV_MAX_COUNTERS the code trusts it. It does not lead to an immediate problem but can potentially lead to it. Prevent getting more than RISCV_MAX_COUNTERS from SBI. Signed-off-by: Viacheslav Mitrofanov <v.v.mitrofanov@yadro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20230505072058.1049732-1-v.v.mitrofanov@yadro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-19RISC-V/perf: Use standard interface to get INTC domainSunil V L
Currently the PMU driver is using DT based lookup to find the INTC node for sscofpmf extension. This will not work for ACPI based systems causing the driver to fail to register the PMU overflow interrupt handler. Hence, change the code to use the standard interface to find the INTC node which works irrespective of DT or ACPI. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20230607112417.782085-3-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29RISC-V: Align SBI probe implementation with specAndrew Jones
sbi_probe_extension() is specified with "Returns 0 if the given SBI extension ID (EID) is not available, or 1 if it is available unless defined as any other non-zero value by the implementation." Additionally, sbiret.value is a long. Fix the implementation to ensure any nonzero long value is considered a success, rather than only positive int values. Fixes: b9dcd9e41587 ("RISC-V: Add basic support for SBI v0.2") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230427163626.101042-1-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-03Merge tag 'riscv-for-linus-6.3-mw2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Some cleanups and fixes for the Zbb-optimized string routines - Support for custom (vendor or implementation defined) perf events - COMMAND_LINE_SIZE has been increased to 1024 * tag 'riscv-for-linus-6.3-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Bump COMMAND_LINE_SIZE value to 1024 drivers/perf: RISC-V: Allow programming custom firmware events riscv, lib: Fix Zbb strncmp RISC-V: improve string-function assembly
2023-03-01drivers/perf: RISC-V: Allow programming custom firmware eventsMayuresh Chitale
Applications need to be able to program the SBI implementation specific or custom firmware events in addition to the standard firmware events. Remove a check in the driver that prohibits the programming of the custom firmware events. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20230208074314.3661406-1-mchitale@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company) - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Sort out confusion between virtual and physical addresses, which currently are the same on s390 - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits) KVM: SVM: hyper-v: placate modpost section mismatch error KVM: x86/mmu: Make tdp_mmu_allowed static KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table ...
2023-02-07perf: RISC-V: Improve privilege mode filtering for perfAtish Patra
Currently, the host driver doesn't have any method to identify if the requested perf event is from kvm or bare metal. As KVM runs in HS mode, there are no separate hypervisor privilege mode to distinguish between the attributes for guest/host. Improve the privilege mode filtering by using the event specific config1 field. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-02-07perf: RISC-V: Define helper functions expose hpm counter width and countAtish Patra
KVM module needs to know how many hardware counters and the counter width that the platform supports. Otherwise, it will not be able to show optimal value of virtual counters to the guest. The virtual hardware counters also need to have the same width as the logical hardware counters for simplicity. However, there shouldn't be mapping between virtual hardware counters and logical hardware counters. As we don't support hetergeneous harts or counters with different width as of now, the implementation relies on the counter width of the first available programmable counter. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-01-13arm64, riscv, perf: Remove RCU_NONIDLE() usagePeter Zijlstra
The PM notifiers should no longer be ran with RCU disabled (per the previous patches), as such this hack is no longer required either. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195542.151174682@infradead.org
2022-10-27drivers/perf: riscv_pmu_sbi: add support for PMU variant on T-Head C9xx coresHeiko Stuebner
With the T-HEAD C9XX cores being designed before or during the ratification to the SSCOFPMF extension, it implements functionality very similar but not equal to it. It implements overflow handling and also some privilege-mode filtering. While SSCOFPMF supports this for all modes, the C9XX only implements the filtering for M-mode and S-mode but not user-mode. So add some adaptions to allow the C9XX to still handle its PMU through the regular SBI PMU interface instead of defining new interfaces or drivers. To work properly, this requires a matching change in SBI, though the actual interface between kernel and SBI does not change. The main differences are a the overflow CSR and irq number. As the reading of the overflow-csr is in the hot-path during irq handling, use an errata and alternatives to not introduce new conditionals there. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/all/20221011231841.2951264-2-heiko@sntech.de/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-13RISC-V: Re-enable counter access from userspacePalmer Dabbelt
These counters were part of the ISA when we froze the uABI, removing them breaks userspace. Link: https://lore.kernel.org/all/YxEhC%2FmDW1lFt36J@aurel32.net/ Fixes: e9991434596f ("RISC-V: Add perf platform driver based on SBI PMU extension") Tested-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20220928131807.30386-1-palmer@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-09Merge tag 'riscv-for-linus-6.1-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Improvements to the CPU topology subsystem, which fix some issues where RISC-V would report bad topology information. - The default NR_CPUS has increased to XLEN, and the maximum configurable value is 512. - The CD-ROM filesystems have been enabled in the defconfig. - Support for THP_SWAP has been added for rv64 systems. There are also a handful of cleanups and fixes throughout the tree. * tag 'riscv-for-linus-6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: enable THP_SWAP for RV64 RISC-V: Print SSTC in canonical order riscv: compat: s/failed/unsupported if compat mode isn't supported RISC-V: Increase range and default value of NR_CPUS cpuidle: riscv-sbi: Fix CPU_PM_CPU_IDLE_ENTER_xyz() macro usage perf: RISC-V: throttle perf events perf: RISC-V: exclude invalid pmu counters from SBI calls riscv: enable CD-ROM file systems in defconfig riscv: topology: fix default topology reporting arm64: topology: move store_cpu_topology() to shared code
2022-09-08perf: RISC-V: fix access beyond allocated arraySergey Matyukevich
SBI firmware should report total number of firmware and hardware counters including unused ones or special ones. In this case the kernel doesn't need to make any assumptions about gaps in reported counters, e.g. excluded timer counter. That was fixed in OpenSBI v1.1 by commit 3f66465fb6bf ("lib: pmu: allow to use the highest available counter"). This kernel patch has no effect if SBI firmware behaves correctly. However it eliminates access beyond the allocated pmu_ctr_list if the kernel is used with OpenSBI older than v1.1. Fixes: e9991434596f ("RISC-V: Add perf platform driver based on SBI PMU extension") Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220830155306.301714-2-geomatsi@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-09-08perf: RISC-V: throttle perf eventsSergey Matyukevich
Call perf_sample_event_took() to report time spent in overflow interrupts. Perf core uses these measurements to throttle perf events properly. Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20220830155306.301714-4-geomatsi@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>