Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:
"Boot code changes:
- A large series of changes to reorganize the x86 boot code into a
better isolated and easier to maintain base of PIC early startup
code in arch/x86/boot/startup/, by Ard Biesheuvel.
Motivation & background:
| Since commit
|
| c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C")
|
| dated Jun 6 2017, we have been using C code on the boot path in a way
| that is not supported by the toolchain, i.e., to execute non-PIC C
| code from a mapping of memory that is different from the one provided
| to the linker. It should have been obvious at the time that this was a
| bad idea, given the need to sprinkle fixup_pointer() calls left and
| right to manipulate global variables (including non-pointer variables)
| without crashing.
|
| This C startup code has been expanding, and in particular, the SEV-SNP
| startup code has been expanding over the past couple of years, and
| grown many of these warts, where the C code needs to use special
| annotations or helpers to access global objects.
This tree includes the first phase of this work-in-progress x86
boot code reorganization.
Scalability enhancements and micro-optimizations:
- Improve code-patching scalability (Eric Dumazet)
- Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper)
CPU features enumeration updates:
- Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S.
Darwish)
- Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish,
Thomas Gleixner)
- Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish)
Memory management changes:
- Allow temporary MMs when IRQs are on (Andy Lutomirski)
- Opt-in to IRQs-off activate_mm() (Andy Lutomirski)
- Simplify choose_new_asid() and generate better code (Borislav
Petkov)
- Simplify 32-bit PAE page table handling (Dave Hansen)
- Always use dynamic memory layout (Kirill A. Shutemov)
- Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov)
- Make 5-level paging support unconditional (Kirill A. Shutemov)
- Stop prefetching current->mm->mmap_lock on page faults (Mateusz
Guzik)
- Predict valid_user_address() returning true (Mateusz Guzik)
- Consolidate initmem_init() (Mike Rapoport)
FPU support and vector computing:
- Enable Intel APX support (Chang S. Bae)
- Reorgnize and clean up the xstate code (Chang S. Bae)
- Make task_struct::thread constant size (Ingo Molnar)
- Restore fpu_thread_struct_whitelist() to fix
CONFIG_HARDENED_USERCOPY=y (Kees Cook)
- Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg
Nesterov)
- Always preserve non-user xfeatures/flags in __state_perm (Sean
Christopherson)
Microcode loader changes:
- Help users notice when running old Intel microcode (Dave Hansen)
- AMD: Do not return error when microcode update is not necessary
(Annie Li)
- AMD: Clean the cache if update did not load microcode (Boris
Ostrovsky)
Code patching (alternatives) changes:
- Simplify, reorganize and clean up the x86 text-patching code (Ingo
Molnar)
- Make smp_text_poke_batch_process() subsume
smp_text_poke_batch_finish() (Nikolay Borisov)
- Refactor the {,un}use_temporary_mm() code (Peter Zijlstra)
Debugging support:
- Add early IDT and GDT loading to debug relocate_kernel() bugs
(David Woodhouse)
- Print the reason for the last reset on modern AMD CPUs (Yazen
Ghannam)
- Add AMD Zen debugging document (Mario Limonciello)
- Fix opcode map (!REX2) superscript tags (Masami Hiramatsu)
- Stop decoding i64 instructions in x86-64 mode at opcode (Masami
Hiramatsu)
CPU bugs and bug mitigations:
- Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov)
- Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov)
- Restructure and harmonize the various CPU bug mitigation methods
(David Kaplan)
- Fix spectre_v2 mitigation default on Intel (Pawan Gupta)
MSR API:
- Large MSR code and API cleanup (Xin Li)
- In-kernel MSR API type cleanups and renames (Ingo Molnar)
PKEYS:
- Simplify PKRU update in signal frame (Chang S. Bae)
NMI handling code:
- Clean up, refactor and simplify the NMI handling code (Sohil Mehta)
- Improve NMI duration console printouts (Sohil Mehta)
Paravirt guests interface:
- Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov)
SEV support:
- Share the sev_secrets_pa value again (Tom Lendacky)
x86 platform changes:
- Introduce the <asm/amd/> header namespace (Ingo Molnar)
- i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to
<asm/amd/fch.h> (Mario Limonciello)
Fixes and cleanups:
- x86 assembly code cleanups and fixes (Uros Bizjak)
- Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy
Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav
Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David
Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf,
Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan
Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank
Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)"
* tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits)
x86/bugs: Fix spectre_v2 mitigation default on Intel
x86/bugs: Restructure ITS mitigation
x86/xen/msr: Fix uninitialized variable 'err'
x86/msr: Remove a superfluous inclusion of <asm/asm.h>
x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only
x86/mm/64: Make 5-level paging support unconditional
x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model
x86/mm/64: Always use dynamic memory layout
x86/bugs: Fix indentation due to ITS merge
x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor()
x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter
x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter
x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2()
x86/cpuid: Rename have_cpuid_p() to cpuid_feature()
x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header
x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h>
x86/msr: Add rdmsrl_on_cpu() compatibility wrapper
x86/mm: Fix kernel-doc descriptions of various pgtable methods
x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too
x86/boot: Defer initialization of VM space related global variables
...
|
|
The throttle support has been added in the generic code. Remove
the driver-specific throttle support.
Besides the throttle, perf_event_overflow may return true because of
event_limit. It already does an inatomic event disable. The pmu->stop
is not required either.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250520181644.2673067-5-kan.liang@linux.intel.com
|
|
For historic reasons there are some TSC-related functions in the
<asm/msr.h> header, even though there's an <asm/tsc.h> header.
To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h>
to <asm/tsc.h> and to eventually eliminate the inclusion of
<asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency
to the source files that reference definitions from <asm/msr.h>.
[ mingo: Clarified the changelog. ]
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Check sample_type in perf_sample_save_brstack() to prevent
saving branch stack data when it isn't required.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Yabin Cui <yabinc@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240515193610.2350456-4-yabinc@google.com
|
|
Warning at every leaking bits can cause a flood of message, triggering
various stall-warning mechanisms to fire, including CSD locks, which
makes the machine to be unusable.
Track the bits that are being leaked, and only warn when a new bit is
set.
That said, this patch will help with the following issues:
1) It will tell us which bits are being set, so, it is easy to
communicate it back to vendor, and to do a root-cause analyzes.
2) It avoid the machine to be unusable, because, worst case
scenario, the user gets less than 60 WARNs (one per unhandled bit).
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lkml.kernel.org/r/20241001141020.2620361-1-leitao@debian.org
|
|
The current perf assumes that both GP and fixed counters are contiguous.
But it's not guaranteed on newer Intel platforms or in a virtualization
environment.
Use the counter mask to replace the number of counters for both GP and
the fixed counters. For the other ARCHs or old platforms which don't
support a counter mask, using GENMASK_ULL(num_counter - 1, 0) to
replace. There is no functional change for them.
The interface to KVM is not changed. The number of counters still be
passed to KVM. It can be updated later separately.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-3-kan.liang@linux.intel.com
|
|
Replace this pattern in events/amd/{un,}core.c:
cmpxchg(*ptr, old, new) == old
... with the simpler and faster:
try_cmpxchg(*ptr, &old, new)
The x86 CMPXCHG instruction returns success in the ZF flag, so this change
saves a compare after the CMPXCHG.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240425101708.5025-1-ubizjak@gmail.com
|
|
Upstream commit c22ac2a3d4bd ("perf: Enable branch record for software
events") added ability to capture LBR (Last Branch Records) on Intel CPUs
from inside BPF program at pretty much any arbitrary point. This is
extremely useful capability that allows to figure out otherwise
hard to debug problems, because LBR is now available based on some
application-defined conditions, not just hardware-supported events.
'retsnoop' is one such tool that takes a huge advantage of this
functionality and has proved to be an extremely useful tool in
practice:
https://github.com/anakryiko/retsnoop
Now, AMD Zen4 CPUs got support for similar LBR functionality, but
necessary wiring inside the kernel is not yet setup. This patch seeks to
rectify this and follows a similar approach to the original patch
for Intel CPUs. We implement an AMD-specific callback set to be called
through perf_snapshot_branch_stack static call.
Previous preparatory patches ensured that amd_pmu_core_disable_all() and
__amd_pmu_lbr_disable() will be completely inlined and will have no
branches, so LBR snapshot contamination will be minimized.
This was tested on AMD Bergamo CPU and worked well when utilized from
the aforementioned retsnoop tool.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20240402022118.1046049-4-andrii@kernel.org
|
|
In the following patches we will enable LBR capture on AMD CPUs at
arbitrary point in time, which means that LBR recording won't be frozen
by hardware automatically as part of hardware overflow event. So we need
to take care to minimize amount of branches and function calls/returns
on the path to freezing LBR, minimizing LBR snapshot altering as much as
possible.
amd_pmu_core_disable_all() is one of the functions on this path, and is
already marked as __always_inline. But it calls amd_pmu_set_global_ctl()
which is marked as just inline. So to guarantee no function call will
be generated thoughout mark amd_pmu_set_global_ctl() as __always_inline
as well.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20240402022118.1046049-2-andrii@kernel.org
|
|
Add the "ref-cycles" event for AMD processors based on Zen 4 and later
microarchitectures. The backing event is based on PMCx120 which counts
cycles not in halt state in P0 frequency (same as MPERF).
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/089155f19f7c7e65aeb1caa727a882e2ca9b8b04.1711352180.git.sandipan.das@amd.com
|
|
AMD processors based on Zen 2 and later microarchitectures do not
support PMCx087 (instruction pipe stalls) which is used as the backing
event for "stalled-cycles-frontend" and "stalled-cycles-backend".
Use PMCx0A9 (cycles where micro-op queue is empty) instead to count
frontend stalls and remove the entry for backend stalls since there
is no direct replacement.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Fixes: 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h")
Link: https://lore.kernel.org/r/03d7fc8fa2a28f9be732116009025bdec1b3ec97.1711352180.git.sandipan.das@amd.com
|
|
Currently, the LBR code assumes that LBR Freeze is supported on all processors
when X86_FEATURE_AMD_LBR_V2 is available i.e. CPUID leaf 0x80000022[EAX]
bit 1 is set. This is incorrect as the availability of the feature is
additionally dependent on CPUID leaf 0x80000022[EAX] bit 2 being set,
which may not be set for all Zen 4 processors.
Define a new feature bit for LBR and PMC freeze and set the freeze enable bit
(FLBRI) in DebugCtl (MSR 0x1d9) conditionally.
It should still be possible to use LBR without freeze for profile-guided
optimization of user programs by using an user-only branch filter during
profiling. When the user-only filter is enabled, branches are no longer
recorded after the transition to CPL 0 upon PMI arrival. When branch
entries are read in the PMI handler, the branch stack does not change.
E.g.
$ perf record -j any,u -e ex_ret_brn_tkn ./workload
Since the feature bit is visible under flags in /proc/cpuinfo, it can be
used to determine the feasibility of use-cases which require LBR Freeze
to be supported by the hardware such as profile-guided optimization of
kernels.
Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/69a453c97cfd11c6f2584b19f937fe6df741510f.1711091584.git.sandipan.das@amd.com
|
|
When bringing a CPU online, some of the PMC and LBR related registers
are reset. The same is done when a CPU is taken offline although that
is unnecessary. This currently happens in the "cpu_dead" callback which
is also incorrect as the callback runs on a control CPU instead of the
one that is being taken offline. This also affects hibernation and
suspend to RAM on some platforms as reported in the link below.
Fixes: 21d59e3e2c40 ("perf/x86/amd/core: Detect PerfMonV2 support")
Reported-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/550a026764342cf7e5812680e3e2b91fe662b5ac.1706526029.git.sandipan.das@amd.com
|
|
AMD (ab)uses topology_die_id() to store the Node ID information and
topology_max_dies_per_pkg to store the number of nodes per package.
This collides with the proper processor die level enumeration which is
coming on AMD with CPUID 8000_0026, unless there is a correlation between
the two. There is zero documentation about that.
So provide new storage and new accessors which for now still access die_id
and topology_max_die_per_pkg(). Will be mopped up after AMD and HYGON are
converted over.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.956116738@linutronix.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
- Add branch stack counters ABI extension to better capture the growing
amount of information the PMU exposes via branch stack sampling.
There's matching tooling support.
- Fix race when creating the nr_addr_filters sysfs file
- Add Intel Sierra Forest and Grand Ridge intel/cstate PMU support
- Add Intel Granite Rapids, Sierra Forest and Grand Ridge uncore PMU
support
- Misc cleanups & fixes
* tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Factor out topology_gidnid_map()
perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology()
perf/x86/amd: Reject branch stack for IBS events
perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge
perf/x86/intel/uncore: Support IIO free-running counters on GNR
perf/x86/intel/uncore: Support Granite Rapids
perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array
perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR
perf: Fix the nr_addr_filters fix
perf/x86/intel/cstate: Add Grand Ridge support
perf/x86/intel/cstate: Add Sierra Forest support
x86/smp: Export symbol cpu_clustergroup_mask()
perf/x86/intel/cstate: Cleanup duplicate attr_groups
perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file
perf/x86/intel: Support branch counters logging
perf/x86/intel: Reorganize attrs and is_visible
perf: Add branch_sample_call_stack
perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag
perf: Add branch stack counters
|
|
Fix typos, most reported by "codespell arch/x86". Only touches comments,
no code changes.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
|
|
Currently, the additional information of a branch entry is stored in a
u64 space. With more and more information added, the space is running
out. For example, the information of occurrences of events will be added
for each branch.
Two places were suggested to append the counters.
https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/
One place is right after the flags of each branch entry. It changes the
existing struct perf_branch_entry. The later ARCH specific
implementation has to be really careful to consistently pick
the right struct.
The other place is right after the entire struct perf_branch_stack.
The disadvantage is that the pointer of the extra space has to be
recorded. The common interface perf_sample_save_brstack() has to be
updated.
The latter is much straightforward, and should be easily understood and
maintained. It is implemented in the patch.
Add a new branch sample type, PERF_SAMPLE_BRANCH_COUNTERS, to indicate
the event which is recorded in the branch info.
The "u64 counters" may store the occurrences of several events. The
information regarding the number of events/counters and the width of
each counter should be exposed via sysfs as a reference for the perf
tool. Define the branch_counter_nr and branch_counter_width ABI here.
The support will be implemented later in the Intel-specific patch.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-1-kan.liang@linux.intel.com
|
|
Zen 4 systems running buggy microcode can hit a WARN_ON() in the PMI
handler, as shown below, several times while perf runs. A simple
`perf top` run is enough to render the system unusable:
WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0
This happens because the Performance Counter Global Status Register
(PerfCntGlobalStatus) has one or more bits set which are considered
reserved according to the "AMD64 Architecture Programmer’s Manual,
Volume 2: System Programming, 24593":
https://www.amd.com/system/files/TechDocs/24593.pdf
To make this less intrusive, warn just once if any reserved bit is set
and prompt the user to update the microcode. Also sanitize the value to
what the code is handling, so that the overflow events continue to be
handled for the number of counters that are known to be sane.
Going forward, the following microcode patch levels are recommended
for Zen 4 processors in order to avoid such issues with reserved bits:
Family=0x19 Model=0x11 Stepping=0x01: Patch=0x0a10113e
Family=0x19 Model=0x11 Stepping=0x02: Patch=0x0a10123e
Family=0x19 Model=0xa0 Stepping=0x01: Patch=0x0aa00116
Family=0x19 Model=0xa0 Stepping=0x02: Patch=0x0aa00212
Commit f2eb058afc57 ("linux-firmware: Update AMD cpu microcode") from
the linux-firmware tree has binaries that meet the minimum required
patch levels.
[ sandipan: - add message to prompt users to update microcode
- rework commit message and call out required microcode levels ]
Fixes: 7685665c390d ("perf/x86/amd/core: Add PerfMonV2 overflow handling")
Reported-by: Jirka Hladky <jhladky@redhat.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/3540f985652f41041e54ee82aa53e7dbd55739ae.1694696888.git.sandipan.das@amd.com/
|
|
Kernels older than v5.19 do not support PerfMonV2 and the PMI handler
does not clear the overflow bits of the PerfCntrGlobalStatus register.
Because of this, loading a recent kernel using kexec from an older
kernel can result in inconsistent register states on Zen 4 systems.
The PMI handler of the new kernel gets confused and shows a warning when
an overflow occurs because some of the overflow bits are set even if the
corresponding counters are inactive. These are remnants from overflows
that were handled by the older kernel.
During CPU hotplug, the PerfCntrGlobalCtl and PerfCntrGlobalStatus
registers should always be cleared for PerfMonV2-capable processors.
However, a condition used for NB event constaints applicable only to
older processors currently prevents this from happening. Move the reset
sequence to an appropriate place and also clear the LBR Freeze bit.
Fixes: 21d59e3e2c40 ("perf/x86/amd/core: Detect PerfMonV2 support")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/882a87511af40792ba69bb0e9026f19a2e71e8a3.1694696888.git.sandipan.das@amd.com
|
|
Although, IBS pmus can be invoked via their own interface, indirect
IBS invocation via core pmu events is also supported with fixed set
of events: cpu-cycles:p, r076:p (same as cpu-cycles:p) and r0C1:p
(micro-ops) for user convenience.
This indirect IBS invocation is broken since commit 66d258c5b048
("perf/core: Optimize perf_init_event()"), which added RAW pmu under
'pmu_idr' list and thus if event_init() fails with RAW pmu, it started
returning error instead of trying other pmus.
Forward precise events from core pmu to IBS by overwriting 'type' and
'config' in the kernel copy of perf_event_attr. Overwriting will cause
perf_init_event() to retry with updated 'type' and 'config', which will
automatically forward event to IBS pmu.
Without patch:
$ sudo ./perf record -C 0 -e r076:p -- sleep 1
Error:
The r076:p event is not supported.
With patch:
$ sudo ./perf record -C 0 -e r076:p -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.341 MB perf.data (37 samples) ]
Fixes: 66d258c5b048 ("perf/core: Optimize perf_init_event()")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230504110003.2548-3-ravi.bangoria@amd.com
|
|
The variable 'status' (which contains the unhandled overflow bits) is
not being properly masked in some cases, displaying the following
warning:
WARNING: CPU: 156 PID: 475601 at arch/x86/events/amd/core.c:972 amd_pmu_v2_handle_irq+0x216/0x270
This seems to be happening because the loop is being continued before
the status bit being unset, in case x86_perf_event_set_period()
returns 0. This is also causing an inconsistency because the "handled"
counter is incremented, but the status bit is not cleaned.
Move the bit cleaning together above, together when the "handled"
counter is incremented.
Fixes: 7685665c390d ("perf/x86/amd/core: Add PerfMonV2 overflow handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230321113338.1669660-1-leitao@debian.org
|
|
When we saves the branch stack to the perf sample data, we needs to
update the sample flags and the dynamic size. To make sure this is
done consistently, add the perf_sample_save_brstack() helper and
convert all call sites.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230118060559.615653-5-namhyung@kernel.org
|
|
The left shift of int 32 bit integer constant 1 is evaluated using 32 bit
arithmetic and then passed as a 64 bit function argument. In the case where
i is 32 or more this can lead to an overflow. Avoid this by shifting
using the BIT_ULL macro instead.
Fixes: 471af006a747 ("perf/x86/amd: Constrain Large Increment per Cycle events")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Kim Phillips <kim.phillips@amd.com>
Link: https://lore.kernel.org/r/20221202135149.1797974-1-colin.i.king@gmail.com
|
|
throttling
amd_pmu_enable_all() does:
if (!test_bit(idx, cpuc->active_mask))
continue;
amd_pmu_enable_event(cpuc->events[idx]);
A perf NMI of another event can come between these two steps. Perf NMI
handler internally disables and enables _all_ events, including the one
which nmi-intercepted amd_pmu_enable_all() was in process of enabling.
If that unintentionally enabled event has very low sampling period and
causes immediate successive NMI, causing the event to be throttled,
cpuc->events[idx] and cpuc->active_mask gets cleared by x86_pmu_stop().
This will result in amd_pmu_enable_event() getting called with event=NULL
when amd_pmu_enable_all() resumes after handling the NMIs. This causes a
kernel crash:
BUG: kernel NULL pointer dereference, address: 0000000000000198
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
[...]
Call Trace:
<TASK>
amd_pmu_enable_all+0x68/0xb0
ctx_resched+0xd9/0x150
event_function+0xb8/0x130
? hrtimer_start_range_ns+0x141/0x4a0
? perf_duration_warn+0x30/0x30
remote_function+0x4d/0x60
__flush_smp_call_function_queue+0xc4/0x500
flush_smp_call_function_queue+0x11d/0x1b0
do_idle+0x18f/0x2d0
cpu_startup_entry+0x19/0x20
start_secondary+0x121/0x160
secondary_startup_64_no_verify+0xe5/0xeb
</TASK>
amd_pmu_disable_all()/amd_pmu_enable_all() calls inside perf NMI handler
were recently added as part of BRS enablement but I'm not sure whether
we really need them. We can just disable BRS in the beginning and enable
it back while returning from NMI. This will solve the issue by not
enabling those events whose active_masks are set but are not yet enabled
in hw pmu.
Fixes: ada543459cab ("perf/x86/amd: Add AMD Fam19h Branch Sampling support")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221114044029.373-1-ravi.bangoria@amd.com
|
|
In preparation for making it a static_call, change the signature.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220829101321.573713839@infradead.org
|
|
Use the new sample_flags to indicate whether the branch stack is filled
by the PMU driver.
Remove the br_stack from the perf_sample_data_init() to minimize the number
of cache lines touched.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220901130959.1285717-4-kan.liang@linux.intel.com
|
|
If AMD Last Branch Record Extension Version 2 (LbrExtV2) is detected,
convert the requested branch filter (PERF_SAMPLE_BRANCH_* flags) to the
corresponding hardware filter value and stash it in the event data when
a branch stack is requested. The hardware filter value is also saved in
per-CPU areas for use during event scheduling.
Hardware filtering is provided by the LBR Branch Select register. It has
bits which when set, suppress recording of the following types of branches:
* CPL = 0 (Kernel only)
* CPL > 0 (Userspace only)
* Conditional Branches
* Near Relative Calls
* Near Indirect Calls
* Near Returns
* Near Indirect Jumps (excluding Near Indirect Calls and Near Returns)
* Near Relative Jumps (excluding Near Relative Calls)
* Far Branches
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/9336af5c9785b8e14c62220fc0e6cfb10ab97de3.1660211399.git.sandipan.das@amd.com
|
|
If AMD Last Branch Record Extension Version 2 (LbrExtV2) is detected,
enable it alongside LBR Freeze on PMI when an event requests branch stack
i.e. PERF_SAMPLE_BRANCH_STACK.
Each branch record is represented by a pair of registers, LBR From and LBR
To. The freeze feature prevents any updates to these registers once a PMC
overflows. The contents remain unchanged until the freeze bit is cleared by
the PMI handler.
The branch records are read and copied to sample data before unfreezing.
However, only valid entries are copied. There is no additional register to
denote which of the register pairs represent the top of the stack (TOS)
since internal register renaming always ensures that the first pair (i.e.
index 0) is the one representing the most recent branch and so on.
The LBR registers are per-thread resources and are cleared explicitly
whenever a new task is scheduled in. There are no special implications on
the contents of these registers when transitioning to deep C-states.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/d3b8500a3627a0d4d0259b005891ee248f248d91.1660211399.git.sandipan.das@amd.com
|
|
AMD Last Branch Record Extension Version 2 (LbrExtV2) is driven by Core PMC
overflows. It records recently taken branches up to the moment when the PMC
overflow occurs.
Detect the feature during PMU initialization and set the branch stack depth
using CPUID leaf 0x80000022 EBX.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/fc6e45378ada258f1bab79b0de6e05c393a8f1dd.1660211399.git.sandipan.das@amd.com
|
|
AMD processors that are capable of recording branches support either Branch
Sampling (BRS) or Last Branch Record (LBR). In preparation for adding Last
Branch Record Extension Version 2 (LbrExtV2) support, introduce new static
calls which act as gateways to call into the feature-dependent functions
based on what is available on the processor.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/b75dbc32663cb395f0d701167e952c6a6b0445a3.1660211399.git.sandipan.das@amd.com
|
|
AMD processors that are capable of recording branches support either Branch
Sampling (BRS) or Last Branch Record (LBR). In preparation for adding Last
Branch Record Extension Version 2 (LbrExtV2) support, reuse the "branches"
capability to advertise information about both BRS and LBR but make the
"branch-brs" event exclusive to Family 19h processors that support BRS.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/ba4a4cde6db79b1c65c49834027bbdb8a915546b.1660211399.git.sandipan.das@amd.com
|
|
Move some of the Branch Sampling (BRS) specific functions out of the Core
events sources and into the BRS sources in preparation for adding other
mechanisms to record branches.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/b60283b57179475d18ee242d117c335c16733693.1660211399.git.sandipan.das@amd.com
|
|
Commit 1018faa6cf23 ("perf/x86/kvm: Fix Host-Only/Guest-Only
counting with SVM disabled") addresses an issue in which the
Host-Only bit in the counter control registers needs to be
masked off when SVM is not enabled.
The events need to be reloaded whenever SVM is enabled or
disabled for a CPU and this requires the PERF_CTL registers
to be reprogrammed using {enable,disable}_all(). However,
PerfMonV2 variants of these functions do not reprogram the
PERF_CTL registers. Hence, the legacy enable_all() function
should also be called.
Fixes: 9622e67e3980 ("perf/x86/amd/core: Add PerfMonV2 counter control")
Reported-by: Like Xu <likexu@tencent.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220518084327.464005-1-sandipan.das@amd.com
|
|
There's two problems with the current amd_brs_adjust_period() code:
- it isn't in fact AMD specific and wil always adjust the period;
- it adjusts the period, while it should only adjust the event count,
resulting in repoting a short period.
Fix this by using x86_pmu.limit_period, this makes it specific to the
AMD BRS case and ensures only the event count is adjusted while the
reported period is unmodified.
Fixes: ba2fe7500845 ("perf/x86/amd: Add AMD branch sampling period adjustment")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
'hwc' is never used in amd_pmu_enable_all(), so remove it.
Signed-off-by: Zucheng Zheng <zhengzucheng@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421111031.174698-1-zhengzucheng@huawei.com
|
|
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use a new scheme to process Core PMC overflows
in the NMI handler using the new global control and status
registers. This will be bypassed on unsupported hardware
(x86_pmu.version < 2).
In x86_pmu_handle_irq(), overflows are detected by testing
the contents of the PERF_CTR register for each active PMC in
a loop. The new scheme instead inspects the overflow bits of
the global status register.
The Performance Counter Global Status (PerfCntrGlobalStatus)
register has overflow (PerfCntrOvfl) bits for each PMC. This
is, however, a read-only MSR. To acknowledge that overflows
have been processed, the NMI handler must clear the bits by
writing to the PerfCntrGlobalStatusClr register.
In x86_pmu_handle_irq(), PMCs counting the same event that
are started and stopped at the same time record slightly
different counts due to delays in between reads from the
PERF_CTR registers. This is fixed by stopping and starting
the PMCs at the same before and with a single write to the
Performance Counter Global Control (PerfCntrGlobalCtl) upon
entering and before exiting the NMI handler.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/f20b7e4da0b0a83bdbe05857f354146623bc63ab.1650515382.git.sandipan.das@amd.com
|
|
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use a new scheme to manage the Core PMCs using
the new global control and status registers. This will be
bypassed on unsupported hardware (x86_pmu.version < 2).
Currently, all PMCs have dedicated control (PERF_CTL) and
counter (PERF_CTR) registers. For a given PMC, the enable
(En) bit of its PERF_CTL register is used to start or stop
counting.
The Performance Counter Global Control (PerfCntrGlobalCtl)
register has enable (PerfCntrEn) bits for each PMC. For a
PMC to start counting, both PERF_CTL and PerfCntrGlobalCtl
enable bits must be set. If either of those are cleared,
the PMC stops counting.
In x86_pmu_{en,dis}able_all(), the PERF_CTL registers of
all active PMCs are written to in a loop. Ideally, PMCs
counting the same event that were started and stopped at
the same time should record the same counts. Due to delays
in between writes to the PERF_CTL registers across loop
iterations, the PMCs cannot be enabled or disabled at the
same instant and hence, record slightly different counts.
This is fixed by enabling or disabling all active PMCs at
the same time with a single write to the PerfCntrGlobalCtl
register.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/dfe8e934074aaabc6ba748dfaccd0a77c974bb82.1650515382.git.sandipan.das@amd.com
|
|
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use CPUID leaf 0x80000022 EBX to detect the
number of Core PMCs. This offers more flexibility if the
counts change in later processor families.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/68a6d9688df189267db26530378870edd34f7b06.1650515382.git.sandipan.das@amd.com
|
|
AMD Performance Monitoring Version 2 (PerfMonV2) introduces
some new Core PMU features such as detection of the number
of available PMCs and managing PMCs using global registers
namely, PerfCntrGlobalCtl and PerfCntrGlobalStatus.
Clearing PerfCntrGlobalCtl and PerfCntrGlobalStatus ensures
that all PMCs are inactive and have no pending overflows
when CPUs are onlined or offlined.
The PMU version (x86_pmu.version) now indicates PerfMonV2
support and will be used to bypass the new features on
unsupported processors.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/dc8672ecbddff394e088ca8abf94b089b8ecc2e7.1650515382.git.sandipan.das@amd.com
|
|
On AMD Fam19h Zen3, the branch sampling (BRS) feature must be disabled before
entering low power and re-enabled (if was active) when returning from low
power. Otherwise, the NMI interrupt may be held up for too long and cause
problems. Stopping BRS will cause the NMI to be delivered if it was held up.
Define a perf_amd_brs_lopwr_cb() callback to stop/restart BRS. The callback
is protected by a jump label which is enabled only when AMD BRS is detected.
In all other cases, the callback is never called.
Signed-off-by: Stephane Eranian <eranian@google.com>
[peterz: static_call() and build fixes]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322221517.2510440-10-eranian@google.com
|
|
Add a pseudo event called branch-brs to help use the FAM Fam19h
Branch Sampling feature (BRS). BRS samples taken branches, so it is best used
when sampling on a retired taken branch event (0xc4) which is what BRS
captures. Instead of trying to remember the event code or actual event name,
users can simply do:
$ perf record -b -e cpu/branch-brs/ -c 1000037 .....
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322221517.2510440-5-eranian@google.com
|
|
Add support for the AMD Fam19h 16-deep branch sampling feature as
described in the AMD PPR Fam19h Model 01h Revision B1. This is a model
specific extension. It is not an architected AMD feature.
The Branch Sampling (BRS) operates with a 16-deep saturating buffer in MSR
registers. There is no branch type filtering. All control flow changes are
captured. BRS relies on specific programming of the core PMU of Fam19h. In
particular, the following requirements must be met:
- the sampling period be greater than 16 (BRS depth)
- the sampling period must use a fixed and not frequency mode
BRS interacts with the NMI interrupt as well. Because enabling BRS is
expensive, it is only activated after P event occurrences, where P is the
desired sampling period. At P occurrences of the event, the counter
overflows, the CPU catches the interrupt, activates BRS for 16 branches until
it saturates, and then delivers the NMI to the kernel. Between the overflow
and the time BRS activates more branches may be executed skewing the period.
All along, the sampling event keeps counting. The skid may be attenuated by
reducing the sampling period by 16 (subsequent patch).
BRS is integrated into perf_events seamlessly via the same
PERF_RECORD_BRANCH_STACK sample format. BRS generates perf_branch_entry
records in the sampling buffer. No prediction information is supported. The
branches are stored in reverse order of execution. The most recent branch is
the first entry in each record.
No modification to the perf tool is necessary.
BRS can be used with any sampling event. However, it is recommended to use
the RETIRED_BRANCH_INSTRUCTIONS event because it matches what the BRS
captures.
$ perf record -b -c 1000037 -e cpu/event=0xc2,name=ret_br_instructions/ test
$ perf report -D
56531696056126 0x193c000 [0x1a8]: PERF_RECORD_SAMPLE(IP, 0x2): 18122/18230: 0x401d24 period: 1000037 addr: 0
... branch stack: nr:16
..... 0: 0000000000401d24 -> 0000000000401d5a 0 cycles 0
..... 1: 0000000000401d5c -> 0000000000401d24 0 cycles 0
..... 2: 0000000000401d22 -> 0000000000401d5c 0 cycles 0
..... 3: 0000000000401d5e -> 0000000000401d22 0 cycles 0
..... 4: 0000000000401d20 -> 0000000000401d5e 0 cycles 0
..... 5: 0000000000401d3e -> 0000000000401d20 0 cycles 0
..... 6: 0000000000401d42 -> 0000000000401d3e 0 cycles 0
..... 7: 0000000000401d3c -> 0000000000401d42 0 cycles 0
..... 8: 0000000000401d44 -> 0000000000401d3c 0 cycles 0
..... 9: 0000000000401d3a -> 0000000000401d44 0 cycles 0
..... 10: 0000000000401d46 -> 0000000000401d3a 0 cycles 0
..... 11: 0000000000401d38 -> 0000000000401d46 0 cycles 0
..... 12: 0000000000401d48 -> 0000000000401d38 0 cycles 0
..... 13: 0000000000401d36 -> 0000000000401d48 0 cycles 0
..... 14: 0000000000401d4a -> 0000000000401d36 0 cycles 0
..... 15: 0000000000401d34 -> 0000000000401d4a 0 cycles 0
... thread: test:18230
...... dso: test
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322221517.2510440-4-eranian@google.com
|
|
Fix ~144 single-word typos in arch/x86/ code comments.
Doing this in a single commit should reduce the churn.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
|
|
The Last Level Cache ID is returned by amd_get_nb_id(). In practice,
this value is the same as the AMD NodeId for callers of this function.
The NodeId is saved in struct cpuinfo_x86.cpu_die_id.
Replace calls to amd_get_nb_id() with the logical CPU's cpu_die_id and
remove the function.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201109210659.754018-3-Yazen.Ghannam@amd.com
|
|
Commit 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h"),
claimed L2 misses were unsupported, due to them not being found in its
referenced documentation, whose link has now moved [1].
That old documentation listed PMCx064 unit mask bit 3 as:
"LsRdBlkC: LS Read Block C S L X Change to X Miss."
and bit 0 as:
"IcFillMiss: IC Fill Miss"
We now have new public documentation [2] with improved descriptions, that
clearly indicate what events those unit mask bits represent:
Bit 3 now clearly states:
"LsRdBlkC: Data Cache Req Miss in L2 (all types)"
and bit 0 is:
"IcFillMiss: Instruction Cache Req Miss in L2."
So we can now add support for L2 misses in perf's genericised events as
PMCx064 with both the above unit masks.
[1] The commit's original documentation reference, "Processor Programming
Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors",
originally available here:
https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
is now available here:
https://developer.amd.com/wordpress/media/2017/11/54945_PPR_Family_17h_Models_00h-0Fh.pdf
[2] "Processor Programming Reference (PPR) for Family 17h Model 31h,
Revision B0 Processors", available here:
https://developer.amd.com/wp-content/resources/55803_0.54-PUB.pdf
Fixes: 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h")
Reported-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200121171232.28839-1-kim.phillips@amd.com
|
|
Description of hardware operation
---------------------------------
The core AMD PMU has a 4-bit wide per-cycle increment for each
performance monitor counter. That works for most events, but
now with AMD Family 17h and above processors, some events can
occur more than 15 times in a cycle. Those events are called
"Large Increment per Cycle" events. In order to count these
events, two adjacent h/w PMCs get their count signals merged
to form 8 bits per cycle total. In addition, the PERF_CTR count
registers are merged to be able to count up to 64 bits.
Normally, events like instructions retired, get programmed on a single
counter like so:
PERF_CTL0 (MSR 0xc0010200) 0x000000000053ff0c # event 0x0c, umask 0xff
PERF_CTR0 (MSR 0xc0010201) 0x0000800000000001 # r/w 48-bit count
The next counter at MSRs 0xc0010202-3 remains unused, or can be used
independently to count something else.
When counting Large Increment per Cycle events, such as FLOPs,
however, we now have to reserve the next counter and program the
PERF_CTL (config) register with the Merge event (0xFFF), like so:
PERF_CTL0 (msr 0xc0010200) 0x000000000053ff03 # FLOPs event, umask 0xff
PERF_CTR0 (msr 0xc0010201) 0x0000800000000001 # rd 64-bit cnt, wr lo 48b
PERF_CTL1 (msr 0xc0010202) 0x0000000f004000ff # Merge event, enable bit
PERF_CTR1 (msr 0xc0010203) 0x0000000000000000 # wr hi 16-bits count
The count is widened from the normal 48-bits to 64 bits by having the
second counter carry the higher 16 bits of the count in its lower 16
bits of its counter register.
The odd counter, e.g., PERF_CTL1, is programmed with the enabled Merge
event before the even counter, PERF_CTL0.
The Large Increment feature is available starting with Family 17h.
For more details, search any Family 17h PPR for the "Large Increment
per Cycle Events" section, e.g., section 2.1.15.3 on p. 173 in this
version:
https://www.amd.com/system/files/TechDocs/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip
Description of software operation
---------------------------------
The following steps are taken in order to support reserving and
enabling the extra counter for Large Increment per Cycle events:
1. In the main x86 scheduler, we reduce the number of available
counters by the number of Large Increment per Cycle events being
scheduled, tracked by a new cpuc variable 'n_pair' and a new
amd_put_event_constraints_f17h(). This improves the counter
scheduler success rate.
2. In perf_assign_events(), if a counter is assigned to a Large
Increment event, we increment the current counter variable, so the
counter used for the Merge event is removed from assignment
consideration by upcoming event assignments.
3. In find_counter(), if a counter has been found for the Large
Increment event, we set the next counter as used, to prevent other
events from using it.
4. We perform steps 2 & 3 also in the x86 scheduler fastpath, i.e.,
we add Merge event accounting to the existing used_mask logic.
5. Finally, we add on the programming of Merge event to the
neighbouring PMC counters in the counter enable/disable{_all}
code paths.
Currently, software does not support a single PMU with mixed 48- and
64-bit counting, so Large increment event counts are limited to 48
bits. In set_period, we zero-out the upper 16 bits of the count, so
the hardware doesn't copy them to the even counter's higher bits.
Simple invocation example showing counting 8 FLOPs per 256-bit/%ymm
vaddps instruction executed in a loop 100 million times:
perf stat -e cpu/fp_ret_sse_avx_ops.all/,cpu/instructions/ <workload>
Performance counter stats for '<workload>':
800,000,000 cpu/fp_ret_sse_avx_ops.all/u
300,042,101 cpu/instructions/u
Prior to this patch, the reported SSE/AVX FLOPs retired count would
be wrong.
[peterz: lots of renames and edits to the code]
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
AMD Family 17h processors and above gain support for Large Increment
per Cycle events. Unfortunately there is no CPUID or equivalent bit
that indicates whether the feature exists or not, so we continue to
determine eligibility based on a CPU family number comparison.
For Large Increment per Cycle events, we add a f17h-and-compatibles
get_event_constraints_f17h() that returns an even counter bitmask:
Large Increment per Cycle events can only be placed on PMCs 0, 2,
and 4 out of the currently available 0-5. The only currently
public event that requires this feature to report valid counts
is PMCx003 "Retired SSE/AVX Operations".
Note that the CPU family logic in amd_core_pmu_init() is changed
so as to be able to selectively add initialization for features
available in ranges of backward-compatible CPU families. This
Large Increment per Cycle feature is expected to be retained
in future families.
A side-effect of assigning a new get_constraints function for f17h
disables calling the old (prior to f15h) amd_get_event_constraints
implementation left enabled by commit e40ed1542dd7 ("perf/x86: Add perf
support for AMD family-17h processors"), which is no longer
necessary since those North Bridge event codes are obsoleted.
Also fix a spelling mistake whilst in the area (calulating ->
calculating).
Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191114183720.19887-2-kim.phillips@amd.com
|
|
'-Wunused-but-set-variable' triggers this warning:
arch/x86/events/amd/core.c: In function amd_pmu_handle_irq:
arch/x86/events/amd/core.c:656:6: warning: variable active set but not used [-Wunused-but-set-variable]
GCC is right, 'active' is not used anymore.
This variable was introduced earlier this year and then removed in:
df4d29732fdad perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp
[ mingo: Improved the changelog, fixed build warning caused by this fix, improved surrounding code. ]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Cc: <acme@kernel.org>
Cc: <alexander.shishkin@linux.intel.com>
Cc: <mark.rutland@arm.com>
Cc: <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191110094453.113001-1-zhengyongjun3@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|