summaryrefslogtreecommitdiff
path: root/arch/x86/events/intel/lbr.c
AgeCommit message (Collapse)Author
2025-05-26Merge tag 'x86-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "Boot code changes: - A large series of changes to reorganize the x86 boot code into a better isolated and easier to maintain base of PIC early startup code in arch/x86/boot/startup/, by Ard Biesheuvel. Motivation & background: | Since commit | | c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") | | dated Jun 6 2017, we have been using C code on the boot path in a way | that is not supported by the toolchain, i.e., to execute non-PIC C | code from a mapping of memory that is different from the one provided | to the linker. It should have been obvious at the time that this was a | bad idea, given the need to sprinkle fixup_pointer() calls left and | right to manipulate global variables (including non-pointer variables) | without crashing. | | This C startup code has been expanding, and in particular, the SEV-SNP | startup code has been expanding over the past couple of years, and | grown many of these warts, where the C code needs to use special | annotations or helpers to access global objects. This tree includes the first phase of this work-in-progress x86 boot code reorganization. Scalability enhancements and micro-optimizations: - Improve code-patching scalability (Eric Dumazet) - Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper) CPU features enumeration updates: - Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S. Darwish) - Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish, Thomas Gleixner) - Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish) Memory management changes: - Allow temporary MMs when IRQs are on (Andy Lutomirski) - Opt-in to IRQs-off activate_mm() (Andy Lutomirski) - Simplify choose_new_asid() and generate better code (Borislav Petkov) - Simplify 32-bit PAE page table handling (Dave Hansen) - Always use dynamic memory layout (Kirill A. Shutemov) - Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov) - Make 5-level paging support unconditional (Kirill A. Shutemov) - Stop prefetching current->mm->mmap_lock on page faults (Mateusz Guzik) - Predict valid_user_address() returning true (Mateusz Guzik) - Consolidate initmem_init() (Mike Rapoport) FPU support and vector computing: - Enable Intel APX support (Chang S. Bae) - Reorgnize and clean up the xstate code (Chang S. Bae) - Make task_struct::thread constant size (Ingo Molnar) - Restore fpu_thread_struct_whitelist() to fix CONFIG_HARDENED_USERCOPY=y (Kees Cook) - Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg Nesterov) - Always preserve non-user xfeatures/flags in __state_perm (Sean Christopherson) Microcode loader changes: - Help users notice when running old Intel microcode (Dave Hansen) - AMD: Do not return error when microcode update is not necessary (Annie Li) - AMD: Clean the cache if update did not load microcode (Boris Ostrovsky) Code patching (alternatives) changes: - Simplify, reorganize and clean up the x86 text-patching code (Ingo Molnar) - Make smp_text_poke_batch_process() subsume smp_text_poke_batch_finish() (Nikolay Borisov) - Refactor the {,un}use_temporary_mm() code (Peter Zijlstra) Debugging support: - Add early IDT and GDT loading to debug relocate_kernel() bugs (David Woodhouse) - Print the reason for the last reset on modern AMD CPUs (Yazen Ghannam) - Add AMD Zen debugging document (Mario Limonciello) - Fix opcode map (!REX2) superscript tags (Masami Hiramatsu) - Stop decoding i64 instructions in x86-64 mode at opcode (Masami Hiramatsu) CPU bugs and bug mitigations: - Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov) - Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov) - Restructure and harmonize the various CPU bug mitigation methods (David Kaplan) - Fix spectre_v2 mitigation default on Intel (Pawan Gupta) MSR API: - Large MSR code and API cleanup (Xin Li) - In-kernel MSR API type cleanups and renames (Ingo Molnar) PKEYS: - Simplify PKRU update in signal frame (Chang S. Bae) NMI handling code: - Clean up, refactor and simplify the NMI handling code (Sohil Mehta) - Improve NMI duration console printouts (Sohil Mehta) Paravirt guests interface: - Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov) SEV support: - Share the sev_secrets_pa value again (Tom Lendacky) x86 platform changes: - Introduce the <asm/amd/> header namespace (Ingo Molnar) - i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to <asm/amd/fch.h> (Mario Limonciello) Fixes and cleanups: - x86 assembly code cleanups and fixes (Uros Bizjak) - Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf, Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)" * tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits) x86/bugs: Fix spectre_v2 mitigation default on Intel x86/bugs: Restructure ITS mitigation x86/xen/msr: Fix uninitialized variable 'err' x86/msr: Remove a superfluous inclusion of <asm/asm.h> x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only x86/mm/64: Make 5-level paging support unconditional x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model x86/mm/64: Always use dynamic memory layout x86/bugs: Fix indentation due to ITS merge x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor() x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2() x86/cpuid: Rename have_cpuid_p() to cpuid_feature() x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h> x86/msr: Add rdmsrl_on_cpu() compatibility wrapper x86/mm: Fix kernel-doc descriptions of various pgtable methods x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too x86/boot: Defer initialization of VM space related global variables ...
2025-04-10x86/msr: Rename 'wrmsrl_safe()' to 'wrmsrq_safe()'Ingo Molnar
Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-10x86/msr: Rename 'wrmsrl()' to 'wrmsrq()'Ingo Molnar
Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-10x86/msr: Rename 'rdmsrl()' to 'rdmsrq()'Ingo Molnar
Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-08perf/x86: Add dynamic constraintKan Liang
More and more features require a dynamic event constraint, e.g., branch counter logging, auto counter reload, Arch PEBS, etc. Add a generic flag, PMU_FL_DYN_CONSTRAINT, to indicate the case. It avoids keeping adding the individual flag in intel_cpuc_prepare(). Add a variable dyn_constraint in the struct hw_perf_event to track the dynamic constraint of the event. Apply it if it's updated. Apply the generic dynamic constraint for branch counter logging. Many features on and after V6 require dynamic constraint. So unconditionally set the flag for V6+. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lkml.kernel.org/r/20250327195217.2683619-2-kan.liang@linux.intel.com
2025-03-17perf/x86: Remove swap_task_ctx()Kan Liang
The pmu specific data is saved in task_struct now. It doesn't need to swap between context. Remove swap_task_ctx() support. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250314172700.438923-6-kan.liang@linux.intel.com
2025-03-17perf/x86/lbr: Fix shorter LBRs call stacks for the system-wide modeKan Liang
In the system-wide mode, LBR callstacks are shorter in comparison to the per-process mode. LBR MSRs are reset during a context switch in the system-wide mode. For the LBR call stack, the LBRs should be always saved/restored during a context switch. Use the space in task_struct to save/restore the LBR call stack data. For a system-wide event, it's unnecessagy to update the lbr_callstack_users for each threads. Add a variable in x86_pmu to indicate whether the system-wide event is active. Fixes: 76cb2c617f12 ("perf/x86/intel: Save/restore LBR stack during context switch") Reported-by: Andi Kleen <ak@linux.intel.com> Reported-by: Alexey Budankov <alexey.budankov@linux.intel.com> Debugged-by: Alexey Budankov <alexey.budankov@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250314172700.438923-5-kan.liang@linux.intel.com
2025-03-17perf: Supply task information to sched_task()Kan Liang
To save/restore LBR call stack data in system-wide mode, the task_struct information is required. Extend the parameters of sched_task() to supply task_struct information. When schedule in, the LBR call stack data for new task will be restored. When schedule out, the LBR call stack data for old task will be saved. Only need to pass the required task_struct information. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250314172700.438923-4-kan.liang@linux.intel.com
2024-05-13Merge tag 'x86-cpu-2024-05-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: - Rework the x86 CPU vendor/family/model code: introduce the 'VFM' value that is an 8+8+8 bit concatenation of the vendor/family/model value, and add macros that work on VFM values. This simplifies the addition of new Intel models & families, and simplifies existing enumeration & quirk code. - Add support for the AMD 0x80000026 leaf, to better parse topology information - Optimize the NUMA allocation layout of more per-CPU data structures - Improve the workaround for AMD erratum 1386 - Clear TME from /proc/cpuinfo as well, when disabled by the firmware - Improve x86 self-tests - Extend the mce_record tracepoint with the ::ppin and ::microcode fields - Implement recovery for MCE errors in TDX/SEAM non-root mode - Misc cleanups and fixes * tag 'x86-cpu-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/mm: Switch to new Intel CPU model defines x86/tsc_msr: Switch to new Intel CPU model defines x86/tsc: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines x86/resctrl: Switch to new Intel CPU model defines x86/microcode/intel: Switch to new Intel CPU model defines x86/mce: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines x86/cpu/intel_epb: Switch to new Intel CPU model defines x86/aperfmperf: Switch to new Intel CPU model defines x86/apic: Switch to new Intel CPU model defines perf/x86/msr: Switch to new Intel CPU model defines perf/x86/intel/uncore: Switch to new Intel CPU model defines perf/x86/intel/pt: Switch to new Intel CPU model defines perf/x86/lbr: Switch to new Intel CPU model defines perf/x86/intel/cstate: Switch to new Intel CPU model defines x86/bugs: Switch to new Intel CPU model defines x86/bugs: Switch to new Intel CPU model defines x86/cpu/vfm: Update arch/x86/include/asm/intel-family.h x86/cpu/vfm: Add new macros to work with (vendor/family/model) values ...
2024-04-25perf/x86/lbr: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240424181500.41519-1-tony.luck%40intel.com
2024-04-11perf/x86/intel: Expose existence of callback support to KVMSean Christopherson
Add a "has_callstack" field to the x86_pmu_lbr structure used to pass information to KVM, and set it accordingly in x86_perf_get_lbr(). KVM will use has_callstack to avoid trying to create perf LBR events with PERF_SAMPLE_BRANCH_CALL_STACK on CPUs that don't support callstacks. Reviewed-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20240307011344.835640-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-27perf/x86/intel: Support branch counters loggingKan Liang
The branch counters logging (A.K.A LBR event logging) introduces a per-counter indication of precise event occurrences in LBRs. It can provide a means to attribute exposed retirement latency to combinations of events across a block of instructions. It also provides a means of attributing Timed LBR latencies to events. The feature is first introduced on SRF/GRR. It is an enhancement of the ARCH LBR. It adds new fields in the LBR_INFO MSRs to log the occurrences of events on the GP counters. The information is displayed by the order of counters. The design proposed in this patch requires that the events which are logged must be in a group with the event that has LBR. If there are more than one LBR group, the counters logging information only from the current group (overflowed) are stored for the perf tool, otherwise the perf tool cannot know which and when other groups are scheduled especially when multiplexing is triggered. The user can ensure it uses the maximum number of counters that support LBR info (4 by now) by making the group large enough. The HW only logs events by the order of counters. The order may be different from the order of enabling which the perf tool can understand. When parsing the information of each branch entry, convert the counter order to the enabled order, and store the enabled order in the extension space. Unconditionally reset LBRs for an LBR event group when it's deleted. The logged counter information is only valid for the current LBR group. If another LBR group is scheduled later, the information from the stale LBRs would be otherwise wrongly interpreted. Add a sanity check in intel_pmu_hw_config(). Disable the feature if other counter filters (inv, cmask, edge, in_tx) are set or LBR call stack mode is enabled. (For the LBR call stack mode, we cannot simply flush the LBR, since it will break the call stack. Also, there is no obvious usage with the call stack mode for now.) Only applying the PERF_SAMPLE_BRANCH_COUNTERS doesn't require any branch stack setup. Expose the maximum number of supported counters and the width of the counters into the sysfs. The perf tool can use the information to parse the logged counters in each branch. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20231025201626.3000228-5-kan.liang@linux.intel.com
2022-12-27perf/x86/lbr: Simplify the exposure check for the LBR_INFO registersLike Xu
The x86_pmu.lbr_info is 0 unless explicitly initialized, so there's no point checking x86_pmu.intel_cap.lbr_format. Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20221125040604.5051-2-weijiang.yang@intel.com
2022-12-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM64: - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d: "Fix a number of issues with MTE, such as races on the tags being initialised vs the PG_mte_tagged flag as well as the lack of support for VM_SHARED when KVM is involved. Patches from Catalin Marinas and Peter Collingbourne"). - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. s390: - Second batch of the lazy destroy patches - First batch of KVM changes for kernel virtual != physical address support - Removal of a unused function x86: - Allow compiling out SMM support - Cleanup and documentation of SMM state save area format - Preserve interrupt shadow in SMM state save area - Respond to generic signals during slow page faults - Fixes and optimizations for the non-executable huge page errata fix. - Reprogram all performance counters on PMU filter change - Cleanups to Hyper-V emulation and tests - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest running on top of a L1 Hyper-V hypervisor) - Advertise several new Intel features - x86 Xen-for-KVM: - Allow the Xen runstate information to cross a page boundary - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured - Add support for 32-bit guests in SCHEDOP_poll - Notable x86 fixes and cleanups: - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. - Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. - Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. - Advertise (on AMD) that the SMM_CTL MSR is not supported - Remove unnecessary exports Generic: - Support for responding to signals during page faults; introduces new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks Selftests: - Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. - Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. - Introduce actual atomics for clear/set_bit() in selftests - Add support for pinning vCPUs in dirty_log_perf_test. - Rename the so called "perf_util" framework to "memstress". - Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the memstress tests. - Add a common ucall implementation; code dedup and pre-work for running SEV (and beyond) guests in selftests. - Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). - A bunch of added/enabled/fixed selftests for ARM64, covering memslots, breakpoints, stage-2 faults and access tracking. - x86-specific selftest changes: - Clean up x86's page table management. - Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. - Clean up the nEPT support checks. - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values. - Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). Documentation: - Remove deleted ioctls from documentation - Clean up the docs for the x86 MSR filter. - Various fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits) KVM: x86: Add proper ReST tables for userspace MSR exits/flags KVM: selftests: Allocate ucall pool from MEM_REGION_DATA KVM: arm64: selftests: Align VA space allocator with TTBR0 KVM: arm64: Fix benign bug with incorrect use of VA_BITS KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow KVM: x86: Advertise that the SMM_CTL MSR is not supported KVM: x86: remove unnecessary exports KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic" tools: KVM: selftests: Convert clear/set_bit() to actual atomics tools: Drop "atomic_" prefix from atomic test_and_set_bit() tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests perf tools: Use dedicated non-atomic clear/set bit helpers tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers KVM: arm64: selftests: Enable single-step without a "full" ucall() KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself KVM: Remove stale comment about KVM_REQ_UNHALT KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR KVM: Reference to kvm_userspace_memory_region in doc and comments KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl ...
2022-11-09perf/x86/core: Zero @lbr instead of returning -1 in x86_perf_get_lbr() stubSean Christopherson
Drop the return value from x86_perf_get_lbr() and have the stub zero out the @lbr structure instead of returning -1 to indicate "no LBR support". KVM doesn't actually check the return value, and instead subtly relies on zeroing the number of LBRs in intel_pmu_init(). Formalize "nr=0 means unsupported" so that KVM doesn't need to add a pointless check on the return value to fix KVM's benign bug. Note, the stub is necessary even though KVM x86 selects PERF_EVENTS and the caller exists only when CONFIG_KVM_INTEL=y. Despite the name, KVM_INTEL doesn't strictly require CPU_SUP_INTEL, it can be built with any of INTEL || CENTAUR || ZHAOXIN CPUs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27perf: Rewrite core context handlingPeter Zijlstra
There have been various issues and limitations with the way perf uses (task) contexts to track events. Most notable is the single hardware PMU task context, which has resulted in a number of yucky things (both proposed and merged). Notably: - HW breakpoint PMU - ARM big.little PMU / Intel ADL PMU - Intel Branch Monitoring PMU - AMD IBS PMU - S390 cpum_cf PMU - PowerPC trace_imc PMU *Current design:* Currently we have a per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' Each task has an array of pointers to a perf_event_context. Each perf_event_context has a direct relation to a PMU and a group of events for that PMU. The task related perf_event_context's have a pointer back to that task. Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which includes a perf_event_context, which again has a direct relation to that PMU, and a group of events for that PMU. The perf_cpu_context also tracks which task context is currently associated with that CPU and includes a few other things like the hrtimer for rotation etc. Each perf_event is then associated with its PMU and one perf_event_context. *Proposed design:* New design proposed by this patch reduce to a single task context and a single CPU context but adds some intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' With the new design, perf_event_context will hold all events for all pmus in the (respective pinned/flexible) rbtrees. This can be achieved by adding pmu to rbtree key: {cpu, pmu, cgroup, group_index} Each perf_event_context carries a list of perf_event_pmu_context which is used to hold per-pmu-per-context state. For example, it keeps track of currently active events for that pmu, a pmu specific task_ctx_data, a flag to tell whether rotation is required or not etc. Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu state like hrtimer details to drive the event rotation, a pointer to perf_event_pmu_context of currently running task and some other ancillary information. Each perf_event is associated to it's pmu, perf_event_context and perf_event_pmu_context. Further optimizations to current implementation are possible. For example, ctx_resched() can be optimized to reschedule only single pmu events. Much thanks to Ravi for picking this up and pushing it towards completion. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221008062424.313-1-ravi.bangoria@amd.com
2022-10-20perf/x86/intel/lbr: Use setup_clear_cpu_cap() instead of clear_cpu_cap()Maxim Levitsky
clear_cpu_cap(&boot_cpu_data) is very similar to setup_clear_cpu_cap() except that the latter also sets a bit in 'cpu_caps_cleared' which later clears the same cap in secondary cpus, which is likely what is meant here. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20220718141123.136106-2-mlevitsk@redhat.com
2022-09-29Merge branch 'v6.0-rc7'Peter Zijlstra
Merge upstream to get RAPTORLAKE_S Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-08-27perf/x86: Move branch classifierSandipan Das
Commit 3e702ff6d1ea ("perf/x86: Add LBR software filter support for Intel CPUs") introduces a software branch filter which complements the hardware branch filter and adds an x86 branch classifier. Move the branch classifier to arch/x86/events/ so that it can be utilized by other vendors for branch record filtering. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/bae5b95470d6bd49f40954bd379f414f5afcb965.1660211399.git.sandipan.das@amd.com
2022-08-19perf/x86/lbr: Enable the branch type for the Arch LBR by defaultKan Liang
On the platform with Arch LBR, the HW raw branch type encoding may leak to the perf tool when the SAVE_TYPE option is not set. In the intel_pmu_store_lbr(), the HW raw branch type is stored in lbr_entries[].type. If the SAVE_TYPE option is set, the lbr_entries[].type will be converted into the generic PERF_BR_* type in the intel_pmu_lbr_filter() and exposed to the user tools. But if the SAVE_TYPE option is NOT set by the user, the current perf kernel doesn't clear the field. The HW raw branch type leaks. There are two solutions to fix the issue for the Arch LBR. One is to clear the field if the SAVE_TYPE option is NOT set. The other solution is to unconditionally convert the branch type and expose the generic type to the user tools. The latter is implemented here, because - The branch type is valuable information. I don't see a case where you would not benefit from the branch type. (Stephane Eranian) - Not having the branch type DOES NOT save any space in the branch record (Stephane Eranian) - The Arch LBR HW can retrieve the common branch types from the LBR_INFO. It doesn't require the high overhead SW disassemble. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com
2022-07-20perf/x86/intel/lbr: Fix unchecked MSR access error on HSWKan Liang
The fuzzer triggers the below trace. [ 7763.384369] unchecked MSR access error: WRMSR to 0x689 (tried to write 0x1fffffff8101349e) at rIP: 0xffffffff810704a4 (native_write_msr+0x4/0x20) [ 7763.397420] Call Trace: [ 7763.399881] <TASK> [ 7763.401994] intel_pmu_lbr_restore+0x9a/0x1f0 [ 7763.406363] intel_pmu_lbr_sched_task+0x91/0x1c0 [ 7763.410992] __perf_event_task_sched_in+0x1cd/0x240 On a machine with the LBR format LBR_FORMAT_EIP_FLAGS2, when the TSX is disabled, a TSX quirk is required to access LBR from registers. The lbr_from_signext_quirk_needed() is introduced to determine whether the TSX quirk should be applied. However, the lbr_from_signext_quirk_needed() is invoked before the intel_pmu_lbr_init(), which parses the LBR format information. Without the correct LBR format information, the TSX quirk never be applied. Move the lbr_from_signext_quirk_needed() into the intel_pmu_lbr_init(). Checking x86_pmu.lbr_has_tsx in the lbr_from_signext_quirk_needed() is not required anymore. Both LBR_FORMAT_EIP_FLAGS2 and LBR_FORMAT_INFO have LBR_TSX flag, but only the LBR_FORMAT_EIP_FLAGS2 requirs the quirk. Update the comments accordingly. Fixes: 1ac7fd8159a8 ("perf/x86/intel/lbr: Support LBR format V7") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220714182630.342107-1-kan.liang@linux.intel.com
2022-04-05perf/core: Add perf_clear_branch_entry_bitfields() helperStephane Eranian
Make it simpler to reset all the info fields on the perf_branch_entry by adding a helper inline function. The goal is to centralize the initialization to avoid missing a field in case more are added. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220322221517.2510440-2-eranian@google.com
2022-03-01perf: Add irq and exception return branch typesAnshuman Khandual
This expands generic branch type classification by adding two more entries there in i.e irq and exception return. Also updates the x86 implementation to process X86_BR_IRET and X86_BR_IRQ records as appropriate. This changes branch types reported to user space on x86 platform but it should not be a problem. The possible scenarios and impacts are enumerated here. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1645681014-3346-1-git-send-email-anshuman.khandual@arm.com
2022-01-18x86/perf: Avoid warning for Arch LBR without XSAVEAndi Kleen
Some hypervisors support Arch LBR, but without the LBR XSAVE support. The current Arch LBR init code prints a warning when the xsave size (0) is unexpected. Avoid printing the warning for the "no LBR XSAVE" case. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211215204029.150686-1-ak@linux.intel.com
2022-01-18perf/x86/intel/lbr: Add static_branch for LBR INFO flagsPeter Zijlstra (Intel)
Using static_branch to replace the LBR INFO flags to optimize the LBR INFO parsing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/1641315077-96661-2-git-send-email-peterz@infradead.org
2022-01-18perf/x86/intel/lbr: Support LBR format V7Peter Zijlstra (Intel)
The Goldmont plus and Tremont have LBR format V7. The V7 has LBR_INFO, which is the same as LBR format V5. But V7 doesn't support TSX. Without the patch, the associated misprediction and cycles information in the LBR_INFO may be lost on a Goldmont plus platform. For Tremont, the patch only impacts the non-PEBS events. Because of the adaptive PEBS, the LBR_INFO is always processed for a PEBS event. Currently, two different ways are used to check the LBR capabilities, which make the codes complex and confusing. For the LBR format V4 and earlier, the global static lbr_desc array is used to store the flags for the LBR capabilities in each LBR format. For LBR format V5 and V6, the current code checks the version number for the LBR capabilities. There are common LBR capabilities among LBR format versions. Several flags for the LBR capabilities are introduced into the struct x86_pmu. The flags, which can be shared among LBR formats, are used to check the LBR capabilities. Add intel_pmu_lbr_init() to set the flags accordingly at boot time. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/1641315077-96661-1-git-send-email-peterz@infradead.org
2021-11-11perf/x86/lbr: Reset LBR_SELECT during vlbr resetWanpeng Li
lbr_select in kvm guest has residual data even if kvm guest is poweroff. We can get residual data in the next boot. Because lbr_select is not reset during kvm vlbr release. Let's reset LBR_SELECT during vlbr reset. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1636096851-36623-1-git-send-email-wanpengli@tencent.com
2021-09-13perf: Enable branch record for software eventsSong Liu
The typical way to access branch record (e.g. Intel LBR) is via hardware perf_event. For CPUs with FREEZE_LBRS_ON_PMI support, PMI could capture reliable LBR. On the other hand, LBR could also be useful in non-PMI scenario. For example, in kretprobe or bpf fexit program, LBR could provide a lot of information on what happened with the function. Add API to use branch record for software use. Note that, when the software event triggers, it is necessary to stop the branch record hardware asap. Therefore, static_call is used to remove some branch instructions in this process. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/bpf/20210910183352.3151445-2-songliubraving@fb.com
2021-07-07Merge tag 'x86-fpu-2021-07-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Thomas Gleixner: "Fixes and improvements for FPU handling on x86: - Prevent sigaltstack out of bounds writes. The kernel unconditionally writes the FPU state to the alternate stack without checking whether the stack is large enough to accomodate it. Check the alternate stack size before doing so and in case it's too small force a SIGSEGV instead of silently corrupting user space data. - MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never been updated despite the fact that the FPU state which is stored on the signal stack has grown over time which causes trouble in the field when AVX512 is available on a CPU. The kernel does not expose the minimum requirements for the alternate stack size depending on the available and enabled CPU features. ARM already added an aux vector AT_MINSIGSTKSZ for the same reason. Add it to x86 as well. - A major cleanup of the x86 FPU code. The recent discoveries of XSTATE related issues unearthed quite some inconsistencies, duplicated code and other issues. The fine granular overhaul addresses this, makes the code more robust and maintainable, which allows to integrate upcoming XSTATE related features in sane ways" * tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again x86/fpu/signal: Let xrstor handle the features to init x86/fpu/signal: Handle #PF in the direct restore path x86/fpu: Return proper error codes from user access functions x86/fpu/signal: Split out the direct restore code x86/fpu/signal: Sanitize copy_user_to_fpregs_zeroing() x86/fpu/signal: Sanitize the xstate check on sigframe x86/fpu/signal: Remove the legacy alignment check x86/fpu/signal: Move initial checks into fpu__restore_sig() x86/fpu: Mark init_fpstate __ro_after_init x86/pkru: Remove xstate fiddling from write_pkru() x86/fpu: Don't store PKRU in xstate in fpu_reset_fpstate() x86/fpu: Remove PKRU handling from switch_fpu_finish() x86/fpu: Mask PKRU from kernel XRSTOR[S] operations x86/fpu: Hook up PKRU into ptrace() x86/fpu: Add PKRU storage outside of task XSAVE buffer x86/fpu: Dont restore PKRU in fpregs_restore_userspace() x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi() x86/fpu: Move FXSAVE_LEAK quirk info __copy_kernel_to_fpregs() x86/fpu: Rename __fpregs_load_activate() to fpregs_restore_userregs() ...
2021-06-24perf/x86/intel/lbr: Zero the xstate buffer on allocationThomas Gleixner
XRSTORS requires a valid xstate buffer to work correctly. XSAVES does not guarantee to write a fully valid buffer according to the SDM: "XSAVES does not write to any parts of the XSAVE header other than the XSTATE_BV and XCOMP_BV fields." XRSTORS triggers a #GP: "If bytes 63:16 of the XSAVE header are not all zero." It's dubious at best how this can work at all when the buffer is not zeroed before use. Allocate the buffers with __GFP_ZERO to prevent XRSTORS failure. Fixes: ce711ea3cab9 ("perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/87wnr0wo2z.ffs@nanos.tec.linutronix.de
2021-06-23x86/fpu/xstate: Sanitize handling of independent featuresThomas Gleixner
The copy functions for the independent features are horribly named and the supervisor and independent part is just overengineered. The point is that the supplied mask has either to be a subset of the independent features or a subset of the task->fpu.xstate managed features. Rewrite it so it checks for invalid overlaps of these areas in the caller supplied feature mask. Rename it so it follows the new naming convention for these operations. Mop up the function documentation. This allows to use that function for other purposes as well. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20210623121455.004880675@linutronix.de
2021-06-23x86/fpu: Rename "dynamic" XSTATEs to "independent"Andy Lutomirski
The salient feature of "dynamic" XSTATEs is that they are not part of the main task XSTATE buffer. The fact that they are dynamically allocated is irrelevant and will become quite confusing when user math XSTATEs start being dynamically allocated. Rename them to "independent" because they are independent of the main XSTATE code. This is just a search-and-replace with some whitespace updates to keep things aligned. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/1eecb0e4f3e07828ebe5d737ec77dc3b708fad2d.1623388344.git.luto@kernel.org Link: https://lkml.kernel.org/r/20210623121454.911450390@linutronix.de
2021-05-18perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic contextLike Xu
If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional might_sleep_if() deep in kmem_cache_alloc() will generate the following trace, and potentially cause a deadlock when another LBR event is added: [] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196 [] Call Trace: [] kmem_cache_alloc+0x36/0x250 [] intel_pmu_lbr_add+0x152/0x170 [] x86_pmu_add+0x83/0xd0 Make it symmetric with the release_lbr_buffers() call and mirror the existing DS buffers. Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read") Signed-off-by: Like Xu <like.xu@linux.intel.com> [peterz: simplified] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20210430052247.3079672-2-like.xu@linux.intel.com
2021-04-28Merge tag 'perf-core-2021-04-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event updates from Ingo Molnar: - Improve Intel uncore PMU support: - Parse uncore 'discovery tables' - a new hardware capability enumeration method introduced on the latest Intel platforms. This table is in a well-defined PCI namespace location and is read via MMIO. It is organized in an rbtree. These uncore tables will allow the discovery of standard counter blocks, but fancier counters still need to be enumerated explicitly. - Add Alder Lake support - Improve IIO stacks to PMON mapping support on Skylake servers - Add Intel Alder Lake PMU support - which requires the introduction of 'hybrid' CPUs and PMUs. Alder Lake is a mix of Golden Cove ('big') and Gracemont ('small' - Atom derived) cores. The CPU-side feature set is entirely symmetrical - but on the PMU side there's core type dependent PMU functionality. - Reduce data loss with CPU level hardware tracing on Intel PT / AUX profiling, by fixing the AUX allocation watermark logic. - Improve ring buffer allocation on NUMA systems - Put 'struct perf_event' into their separate kmem_cache pool - Add support for synchronous signals for select perf events. The immediate motivation is to support low-overhead sampling-based race detection for user-space code. The feature consists of the following main changes: - Add thread-only event inheritance via perf_event_attr::inherit_thread, which limits inheritance of events to CLONE_THREAD. - Add the ability for events to not leak through exec(), via perf_event_attr::remove_on_exec. - Allow the generation of SIGTRAP via perf_event_attr::sigtrap, extend siginfo with an u64 ::si_perf, and add the breakpoint information to ::si_addr and ::si_perf if the event is PERF_TYPE_BREAKPOINT. The siginfo support is adequate for breakpoints right now - but the new field can be used to introduce support for other types of metadata passed over siginfo as well. - Misc fixes, cleanups and smaller updates. * tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) signal, perf: Add missing TRAP_PERF case in siginfo_layout() signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures perf/x86: Allow for 8<num_fixed_counters<16 perf/x86/rapl: Add support for Intel Alder Lake perf/x86/cstate: Add Alder Lake CPU support perf/x86/msr: Add Alder Lake CPU support perf/x86/intel/uncore: Add Alder Lake support perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE perf/x86/intel: Add Alder Lake Hybrid support perf/x86: Support filter_match callback perf/x86/intel: Add attr_update for Hybrid PMUs perf/x86: Add structures for the attributes of Hybrid PMUs perf/x86: Register hybrid PMUs perf/x86: Factor out x86_pmu_show_pmu_cap perf/x86: Remove temporary pmu assignment in event_init perf/x86/intel: Factor out intel_pmu_check_extra_regs perf/x86/intel: Factor out intel_pmu_check_event_constraints perf/x86/intel: Factor out intel_pmu_check_num_counters perf/x86: Hybrid PMU support for extra_regs perf/x86: Hybrid PMU support for event constraints ...
2021-04-27Merge tag 'x86_core_for_v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Borislav Petkov: - Turn the stack canary into a normal __percpu variable on 32-bit which gets rid of the LAZY_GS stuff and a lot of code. - Add an insn_decode() API which all users of the instruction decoder should preferrably use. Its goal is to keep the details of the instruction decoder away from its users and simplify and streamline how one decodes insns in the kernel. Convert its users to it. - kprobes improvements and fixes - Set the maximum DIE per package variable on Hygon - Rip out the dynamic NOP selection and simplify all the machinery around selecting NOPs. Use the simplified NOPs in objtool now too. - Add Xeon Sapphire Rapids to list of CPUs that support PPIN - Simplify the retpolines by folding the entire thing into an alternative now that objtool can handle alternatives with stack ops. Then, have objtool rewrite the call to the retpoline with the alternative which then will get patched at boot time. - Document Intel uarch per models in intel-family.h - Make Sub-NUMA Clustering topology the default and Cluster-on-Die the exception on Intel. * tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) x86, sched: Treat Intel SNC topology as default, COD as exception x86/cpu: Comment Skylake server stepping too x86/cpu: Resort and comment Intel models objtool/x86: Rewrite retpoline thunk calls objtool: Skip magical retpoline .altinstr_replacement objtool: Cache instruction relocs objtool: Keep track of retpoline call sites objtool: Add elf_create_undef_symbol() objtool: Extract elf_symbol_add() objtool: Extract elf_strtab_concat() objtool: Create reloc sections implicitly objtool: Add elf_create_reloc() helper objtool: Rework the elf_rebuild_reloc_section() logic objtool: Fix static_call list generation objtool: Handle per arch retpoline naming objtool: Correctly handle retpoline thunk calls x86/retpoline: Simplify retpolines x86/alternatives: Optimize optimize_nops() x86: Add insn_decode_kernel() x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declaration ...
2021-04-19perf/x86: Track pmu in per-CPU cpu_hw_eventsKan Liang
Some platforms, e.g. Alder Lake, have hybrid architecture. In the same package, there may be more than one type of CPU. The PMU capabilities are different among different types of CPU. Perf will register a dedicated PMU for each type of CPU. Add a 'pmu' variable in the struct cpu_hw_events to track the dedicated PMU of the current CPU. Current x86_get_pmu() use the global 'pmu', which will be broken on a hybrid platform. Modify it to apply the 'pmu' of the specific CPU. Initialize the per-CPU 'pmu' variable with the global 'pmu'. There is nothing changed for the non-hybrid platforms. The is_x86_event() will be updated in the later patch ("perf/x86: Register hybrid PMUs") for hybrid platforms. For the non-hybrid platforms, nothing is changed here. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1618237865-33448-4-git-send-email-kan.liang@linux.intel.com
2021-03-18x86: Fix various typos in commentsIngo Molnar
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-03-15perf/x86/intel/ds: Check return values of insn decoder functionsBorislav Petkov
branch_type() doesn't need to call the full insn_decode() because it doesn't need it in all cases thus leave the calls separate. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210304174237.31945-9-bp@alien8.de
2020-12-14Merge tag 'perf-core-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Core: - Better handling of page table leaves on archictectures which have architectures have non-pagetable aligned huge/large pages. For such architectures a leaf can actually be part of a larger entry. - Prevent a deadlock vs exec_update_mutex Architectures: - The related updates for page size calculation of leaf entries - The usual churn to support new CPUs - Small fixes and improvements all over the place" * tag 'perf-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) perf/x86/intel: Add Tremont Topdown support uprobes/x86: Fix fall-through warnings for Clang perf/x86: Fix fall-through warnings for Clang kprobes/x86: Fix fall-through warnings for Clang perf/x86/intel/lbr: Fix the return type of get_lbr_cycles() perf/x86/intel: Fix rtm_abort_event encoding on Ice Lake x86/kprobes: Restore BTF if the single-stepping is cancelled perf: Break deadlock involving exec_update_mutex sparc64/mm: Implement pXX_leaf_size() support powerpc/8xx: Implement pXX_leaf_size() support arm64/mm: Implement pXX_leaf_size() support perf/core: Fix arch_perf_get_page_size() mm: Introduce pXX_leaf_size() mm/gup: Provide gup_get_pte() more generic perf/x86/intel: Add event constraint for CYCLE_ACTIVITY.STALLS_MEM_ANY perf/x86/intel/uncore: Add Rocket Lake support perf/x86/msr: Add Rocket Lake CPU support perf/x86/cstate: Add Rocket Lake CPU support perf/x86/intel: Add Rocket Lake CPU support perf,mm: Handle non-page-table-aligned hugetlbfs ...
2020-12-09perf/x86/intel/lbr: Fix the return type of get_lbr_cycles()Kan Liang
The cycle count of a timed LBR is always 1 in perf record -D. The cycle count is stored in the first 16 bits of the IA32_LBR_x_INFO register, but the get_lbr_cycles() return Boolean type. Use u16 to replace the Boolean type. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20201125213720.15692-2-kan.liang@linux.intel.com
2020-10-26perf/x86: Avoid TIF_IA32 when checking 64bit modeGabriel Krisman Bertazi
In preparation to remove TIF_IA32, stop using it in perf events code. Tested by running perf on 32-bit, 64-bit and x32 applications. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20201004032536.1229030-2-krisman@collabora.com
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-07-08perf/x86/intel/lbr: Support XSAVES for arch LBR readKan Liang
Reading LBR registers in a perf NMI handler for a non-PEBS event causes a high overhead because the number of LBR registers is huge. To reduce the overhead, the XSAVES instruction should be used to replace the LBR registers' reading method. The XSAVES buffer used for LBR read has to be per-CPU because the NMI handler invoked the lbr_read(). The existing task_ctx_data buffer cannot be used which is per-task and only be allocated for the LBR call stack mode. A new lbr_xsave pointer is introduced in the cpu_hw_events as an XSAVES buffer for LBR read. The XSAVES buffer should be allocated only when LBR is used by a non-PEBS event on the CPU because the total size of the lbr_xsave is not small (~1.4KB). The XSAVES buffer is allocated when a non-PEBS event is added, but it is lazily released in x86_release_hardware() when perf releases the entire PMU hardware resource, because perf may frequently schedule the event, e.g. high context switch. The lazy release method reduces the overhead of frequently allocate/free the buffer. If the lbr_xsave fails to be allocated, roll back to normal Arch LBR lbr_read(). Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Link: https://lkml.kernel.org/r/1593780569-62993-24-git-send-email-kan.liang@linux.intel.com
2020-07-08perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switchKan Liang
In the LBR call stack mode, LBR information is used to reconstruct a call stack. To get the complete call stack, perf has to save/restore all LBR registers during a context switch. Due to a large number of the LBR registers, this process causes a high CPU overhead. To reduce the CPU overhead during a context switch, use the XSAVES/XRSTORS instructions. Every XSAVE area must follow a canonical format: the legacy region, an XSAVE header and the extended region. Although the LBR information is only kept in the extended region, a space for the legacy region and XSAVE header is still required. Add a new dedicated structure for LBR XSAVES support. Before enabling XSAVES support, the size of the LBR state has to be sanity checked, because: - the size of the software structure is calculated from the max number of the LBR depth, which is enumerated by the CPUID leaf for Arch LBR. The size of the LBR state is enumerated by the CPUID leaf for XSAVE support of Arch LBR. If the values from the two CPUID leaves are not consistent, it may trigger a buffer overflow. For example, a hypervisor may unconsciously set inconsistent values for the two emulated CPUID. - unlike other state components, the size of an LBR state depends on the max number of LBRs, which may vary from generation to generation. Expose the function xfeature_size() for the sanity check. The LBR XSAVES support will be disabled if the size of the LBR state enumerated by CPUID doesn't match with the size of the software structure. The XSAVE instruction requires 64-byte alignment for state buffers. A new macro is added to reflect the alignment requirement. A 64-byte aligned kmem_cache is created for architecture LBR. Currently, the structure for each state component is maintained in fpu/types.h. The structure for the new LBR state component should be maintained in the same place. Move structure lbr_entry to fpu/types.h as well for broader sharing. Add dedicated lbr_save/lbr_restore functions for LBR XSAVES support, which invokes the corresponding xstate helpers to XSAVES/XRSTORS LBR information at the context switch when the call stack mode is enabled. Since the XSAVES/XRSTORS instructions will be eventually invoked, the dedicated functions is named with '_xsaves'/'_xrstors' postfix. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Link: https://lkml.kernel.org/r/1593780569-62993-23-git-send-email-kan.liang@linux.intel.com
2020-07-08perf/x86: Remove task_ctx_sizeKan Liang
A new kmem_cache method has replaced the kzalloc() to allocate the PMU specific data. The task_ctx_size is not required anymore. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-19-git-send-email-kan.liang@linux.intel.com
2020-07-08perf/x86/intel/lbr: Create kmem_cache for the LBR context dataKan Liang
A new kmem_cache method is introduced to allocate the PMU specific data task_ctx_data, which requires the PMU specific code to create a kmem_cache. Currently, the task_ctx_data is only used by the Intel LBR call stack feature, which is introduced since Haswell. The kmem_cache should be only created for Haswell and later platforms. There is no alignment requirement for the existing platforms. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-18-git-send-email-kan.liang@linux.intel.com
2020-07-08perf/x86/intel/lbr: Support Architectural LBRKan Liang
Last Branch Records (LBR) enables recording of software path history by logging taken branches and other control flows within architectural registers now. Intel CPUs have had model-specific LBR for quite some time, but this evolves them into an architectural feature now. The main improvements of Architectural LBR implemented includes: - Linux kernel can support the LBR features without knowing the model number of the current CPU. - Architectural LBR capabilities can be enumerated by CPUID. The lbr_ctl_map is based on the CPUID Enumeration. - The possible LBR depth can be retrieved from CPUID enumeration. The max value is written to the new MSR_ARCH_LBR_DEPTH as the number of LBR entries. - A new IA32_LBR_CTL MSR is introduced to enable and configure LBRs, which replaces the IA32_DEBUGCTL[bit 0] and the LBR_SELECT MSR. - Each LBR record or entry is still comprised of three MSRs, IA32_LBR_x_FROM_IP, IA32_LBR_x_TO_IP and IA32_LBR_x_TO_IP. But they become the architectural MSRs. - Architectural LBR is stack-like now. Entry 0 is always the youngest branch, entry 1 the next youngest... The TOS MSR has been removed. The way to enable/disable Architectural LBR is similar to the previous model-specific LBR. __intel_pmu_lbr_enable/disable() can be reused, but some modifications are required, which include: - MSR_ARCH_LBR_CTL is used to enable and configure the Architectural LBR. - When checking the value of the IA32_DEBUGCTL MSR, ignoring the DEBUGCTLMSR_LBR (bit 0) for Architectural LBR, which has no meaning and always return 0. - The FREEZE_LBRS_ON_PMI has to be explicitly set/clear, because MSR_IA32_DEBUGCTLMSR is not touched in __intel_pmu_lbr_disable() for Architectural LBR. - Only MSR_ARCH_LBR_CTL is cleared in __intel_pmu_lbr_disable() for Architectural LBR. Some Architectural LBR dedicated functions are implemented to reset/read/save/restore LBR. - For reset, writing to the ARCH_LBR_DEPTH MSR clears all Arch LBR entries, which is a lot faster and can improve the context switch latency. - For read, the branch type information can be retrieved from the MSR_ARCH_LBR_INFO_*. But it's not fully compatible due to OTHER_BRANCH type. The software decoding is still required for the OTHER_BRANCH case. LBR records are stored in the age order as well. Reuse intel_pmu_store_lbr(). Check the CPUID enumeration before accessing the corresponding bits in LBR_INFO. - For save/restore, applying the fast reset (writing ARCH_LBR_DEPTH). Reading 'lbr_from' of entry 0 instead of the TOS MSR to check if the LBR registers are reset in the deep C-state. If 'the deep C-state reset' bit is not set in CPUID enumeration, ignoring the check. XSAVE support for Architectural LBR will be implemented later. The number of LBR entries cannot be hardcoded anymore, which should be retrieved from CPUID enumeration. A new structure x86_perf_task_context_arch_lbr is introduced for Architectural LBR. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-15-git-send-email-kan.liang@linux.intel.com
2020-07-08perf/x86/intel/lbr: Factor out intel_pmu_store_lbrKan Liang
The way to store the LBR information from a PEBS LBR record can be reused in Architecture LBR, because - The LBR information is stored like a stack. Entry 0 is always the youngest branch. - The layout of the LBR INFO MSR is similar. The LBR information may be retrieved from either the LBR registers (non-PEBS event) or a buffer (PEBS event). Extend rdlbr_*() to support both methods. Explicitly check the invalid entry (0s), which can avoid unnecessary MSR access if using a non-PEBS event. For a PEBS event, the check should slightly improve the performance as well. The invalid entries are cut. The intel_pmu_lbr_filter() doesn't need to check and filter them out. Cannot share the function with current model-specific LBR read, because the direction of the LBR growth is opposite. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-14-git-send-email-kan.liang@linux.intel.com
2020-07-08perf/x86/intel/lbr: Factor out rdlbr_all() and wrlbr_all()Kan Liang
The previous model-specific LBR and Architecture LBR (legacy way) use a similar method to save/restore the LBR information, which directly accesses the LBR registers. The codes which read/write a set of LBR registers can be shared between them. Factor out two functions which are used to read/write a set of LBR registers. Add lbr_info into structure x86_pmu, and use it to replace the hardcoded LBR INFO MSR, because the LBR INFO MSR address of the previous model-specific LBR is different from Architecture LBR. The MSR address should be assigned at boot time. For now, only Sky Lake and later platforms have the LBR INFO MSR. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-13-git-send-email-kan.liang@linux.intel.com
2020-07-08perf/x86/intel/lbr: Mark the {rd,wr}lbr_{to,from} wrappers __always_inlineKan Liang
The {rd,wr}lbr_{to,from} wrappers are invoked in hot paths, e.g. context switch and NMI handler. They should be always inline to achieve better performance. However, the CONFIG_OPTIMIZE_INLINING allows the compiler to uninline functions marked 'inline'. Mark the {rd,wr}lbr_{to,from} wrappers as __always_inline to force inline the wrappers. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-12-git-send-email-kan.liang@linux.intel.com