Age | Commit message (Collapse) | Author |
|
This commit adds a --do-rcu-rust parameter to torture.sh, which invokes
a rust_doctests_kernel kunit run. Note that kunit wants a clean source
tree, so this runs "make mrproper", which might come as a surprise to
some users. Should there be a --mrproper parameter to torture.sh to make
the user explicitly ask for it?
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
|
|
Right now, torture.sh runs normal runs unconditionally, which can be slow
and thus annoying when you only want to test --kcsan or --kasan runs.
This commit therefore adds a --do-normal argument so that "--kcsan
--do-no-kasan --do-no-normal" runs only KCSAN runs. Note that specifying
"--do-no-kasan --do-no-kcsan --do-no-normal" gets normal runs, so you
should not try to use this as a synonym for --do-none.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
|
|
When an XDR counted array has a maximum element count, xdrgen adds
a bounds check to the encoder or decoder for that type. But in cases
where the .x provides no maximum element count, such as
struct notify4 {
/* composed from notify_type4 or notify_deviceid_type4 */
bitmap4 notify_mask;
notifylist4 notify_vals;
};
struct CB_NOTIFY4args {
stateid4 cna_stateid;
nfs_fh4 cna_fh;
notify4 cna_changes<>;
};
xdrgen is supposed to omit that bounds check. Some of the Jinja2
templates handle that correctly, but a few are incorrect and leave
the bounds check in place with a maximum of zero, which causes
encoding/decoding of that type to fail unconditionally.
Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If perf is running with a benchmark then we want the retirement
latency samples associated with the benchmark rather than from the
system as a whole.
Use the workload's PID to filter out samples that aren't from the
workload or its children.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Weilin Wang <weilin.wang@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250430200108.243234-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When running 'make' in tools/testing/selftests/arm64/ without explicitly
setting the OUTPUT variable, the build system will creates test
directories (e.g., /bti) in the root filesystem due to OUTPUT defaulting
to an empty string. This causes unintended pollution of the root directory.
This patch adds proper handling for the OUTPUT variable: Sets OUTPUT
to the current directory (.) if not specified
Signed-off-by: tanze <tanze@kylinos.cn>
Link: https://lore.kernel.org/r/20250515051839.3409658-1-tanze@kylinos.cn
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When MTE is supported but MTE_ASYMM is not (ID_AA64PFR1_EL1.MTE == 2)
ID_AA64PFR1_EL1.MTE_frac == 0xF indicates MTE_ASYNC is unsupported
and MTE_frac == 0 indicates it is supported.
As MTE_frac was previously unconditionally read as 0 from the guest
and user-space, check that using SET_ONE_REG to set it to 0 succeeds
but does not change MTE_frac from unsupported (0xF) to supported (0).
This is required as values originating from KVM from user-space must
be accepted to avoid breaking migration.
Also, to allow this MTE field to be tested, enable KVM_ARM_CAP_MTE
for the set_id_regs test. No effect on existing tests is expected.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Link: https://lore.kernel.org/r/20250512114112.359087-4-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Explicitly specify LDFLAGS as an argument to CC so that this can be
overridden by the user.
Link: https://lore.kernel.org/all/20250328183858.1417835-3-bmasney@redhat.com/
Signed-off-by: Brian Masney <bmasney@redhat.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Allow overriding the CFLAGS assignment so that the user can pass in
an outside value.
Link: https://lore.kernel.org/all/20250328183858.1417835-2-bmasney@redhat.com/
Signed-off-by: Brian Masney <bmasney@redhat.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
'realpath' is not always available, fallback to 'readlink -f' if is not
available. They seem to work equally well in this context.
Link: https://lore.kernel.org/r/20250318160510.3441646-1-yosry.ahmed@linux.dev
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
TC needs arrays of nests, but just a put for now.
Fairly straightforward addition.
Link: https://patch.msgid.link/20250513222011.844106-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc7).
Conflicts:
tools/testing/selftests/drivers/net/hw/ncdevmem.c
97c4e094a4b2 ("tests/ncdevmem: Fix double-free of queue array")
2f1a805f32ba ("selftests: ncdevmem: Implement devmem TCP TX")
https://lore.kernel.org/20250514122900.1e77d62d@canb.auug.org.au
Adjacent changes:
net/core/devmem.c
net/core/devmem.h
0afc44d8cdf6 ("net: devmem: fix kernel panic when netlink socket close after module unload")
bd61848900bf ("net: devmem: Implement TX path")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth and wireless.
A few more fixes for the locking changes trickling in. Nothing too
alarming, I suspect those will continue for another release. Other
than that things are slowing down nicely.
Current release - fix to a fix:
- Bluetooth: hci_event: use key encryption size when its known
- tools: ynl-gen: allow multi-attr without nested-attributes again
Current release - regressions:
- locking fixes:
- lock lower level devices when updating features
- eth: bnxt_en: bring back rtnl_lock() in the bnxt_open() path
- devmem: fix panic when Netlink socket closes after module unload
Current release - new code bugs:
- eth: txgbe: fixes for FW communication on new AML devices
Previous releases - always broken:
- sched: flush gso_skb list too during ->change(), avoid potential
null-deref on reconfig
- wifi: mt76: disable NAPI on driver removal
- hv_netvsc: fix error 'nvsp_rndis_pkt_complete error status: 2'"
* tag 'net-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
net: devmem: fix kernel panic when netlink socket close after module unload
tsnep: fix timestamping with a stacked DSA driver
net/tls: fix kernel panic when alloc_page failed
bnxt_en: bring back rtnl_lock() in the bnxt_open() path
mlxsw: spectrum_router: Fix use-after-free when deleting GRE net devices
wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_request
octeontx2-pf: Do not reallocate all ntuple filters
wifi: mt76: mt7925: fix missing hdr_trans_tlv command for broadcast wtbl
wifi: mt76: disable napi on driver removal
Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()
hv_netvsc: Remove rmsg_pgcnt
hv_netvsc: Preserve contiguous PFN grouping in the page buffer array
hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages
Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges
octeontx2-af: Fix CGX Receive counters
net: ethernet: mtk_eth_soc: fix typo for declaration MT7988 ESW capability
net: libwx: Fix FW mailbox unknown command
net: libwx: Fix FW mailbox reply timeout
net: txgbe: Fix to calculate EEPROM checksum for AML devices
octeontx2-pf: macsec: Fix incorrect max transmit size in TX secy
...
|
|
There is a known issue that the leader sampling is inconsistent, since
throttle only affect leader, not the slave. The detail is in [1].
To maintain test coverage, this patch sets a tolerance rate of 80% to
accommodate the throttled samples and prevent test failures due to
throttling.
[1] lore.kernel.org/20250328182752.769662-1-ctshao@google.com
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20250430140611.599078-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To increase code coverage, extend the ovpn selftests with the following
cases:
* connect UDP peers using a mix of IPv6 and IPv4 at the transport layer
* run full test with tunnel MTU equal to transport MTU (exercising
IP layer fragmentation)
* ping "LAN IP" served by VPN peer ("LAN behind a client" test case)
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
|
|
getaddrinfo() may fail with error code different from EAI_FAIL
or EAI_NONAME, however in this case we still try to free the
results object, thus leading to a crash.
Fix this by bailing out on any possible error.
Fixes: 959bc330a439 ("testing/selftests: add test tool and scripts for ovpn module")
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
|
|
The custom syncookie test expects TCPOPT_WINDOW to be 7 based on the
kernel’s behaviour at the time, but the upcoming series [0] will bump
it to 10.
Let's relax the test to allow any valid TCPOPT_WINDOW value in the
range 1–14.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/netdev/20250513193919.1089692-1-edumazet@google.com/ #[0]
Link: https://patch.msgid.link/20250514214021.85187-1-kuniyu@amazon.com
|
|
Avoid dereferencing bpf_map_skeleton's link field if it's NULL.
If BPF map skeleton is created with the size, that indicates containing
link field, but the field was not actually initialized with valid
bpf_link pointer, libbpf crashes. This may happen when using libbpf-rs
skeleton.
Skeleton loading may still progress, but user needs to attach struct_ops
map separately.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250514113220.219095-1-mykyta.yatsenko5@gmail.com
|
|
Most tracepoints in the kernel are created with TRACE_EVENT(). The
TRACE_EVENT() macro (and DECLARE_EVENT_CLASS() and DEFINE_EVENT() where in
reality, TRACE_EVENT() is just a helper macro that calls those other two
macros), will create not only a tracepoint (the function trace_<event>()
used in the kernel), it also exposes the tracepoint to user space along
with defining what fields will be saved by that tracepoint.
There are a few places that tracepoints are created in the kernel that are
not exposed to userspace via tracefs. They can only be accessed from code
within the kernel. These tracepoints are created with DEFINE_TRACE()
Most of these tracepoints end with "_tp". This is useful as when the
developer sees that, they know that the tracepoint is for in-kernel only
(meaning it can only be accessed inside the kernel, either directly by the
kernel or indirectly via modules and BPF programs) and is not exposed to
user space.
Instead of making this only a process to add "_tp", enforce it by making
the DECLARE_TRACE() append the "_tp" suffix to the tracepoint. This
requires adding DECLARE_TRACE_EVENT() macros for the TRACE_EVENT() macro
to use that keeps the original name.
Link: https://lore.kernel.org/all/20250418083351.20a60e64@gandalf.local.home/
Cc: netdev <netdev@vger.kernel.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250510163730.092fad5b@gandalf.local.home
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The `stat+uniquify.sh` test retrieves all uniquified `clockticks` events
from `perf list -v clockticks` and check if `perf stat -e clockticks -A`
contains all of them.
Committer testing:
root@x1:~# grep -m1 "model name" /proc/cpuinfo
model name : 13th Gen Intel(R) Core(TM) i7-1365U
root@x1:~# perf list clockticks
List of pre-defined events (to be used in -e or -M):
uncore_clock/clockticks/ [Kernel PMU event]
uncore memory:
unc_m_clockticks
[Number of clocks. Unit: uncore_imc]
root@x1:~#
root@x1:~# perf test uniquifying
92: perf stat events uniquifying : Ok
root@x1:~# perf test -vv uniquifying
92: perf stat events uniquifying:
--- start ---
test child forked, pid 1552628
stat event uniquifying test
---- end(0) ----
92: perf stat events uniquifying : Ok
root@x1:~#
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Levi Yun <yeoreum.yun@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250513215401.2315949-4-ctshao@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The merge stat code fails for uncore events if they are repeated twice,
for example `perf stat -e clockticks,clockticks -I 1000` as the counts
of the second set of uncore events will be merged into the first
counter.
Reimplement the logic to have a first_wildcard_match so that merged
later events correctly merge into the first wildcard event that they
will be aggregated into.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Chun-Tse Shao <ctshao@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Levi Yun <yeoreum.yun@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250513215401.2315949-3-ctshao@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
'perf stat' has different uniquification logic to 'perf record' and perf
top. In the case of perf record and 'perf top' all hybrid event names
are uniquified.
'perf stat' is more disciplined respecting name config terms, libpfm4
events, etc.
'perf stat' will uniquify hybrid events and the non-core PMU cases
shouldn't apply to perf record or 'perf top'.
For consistency, remove the uniquification for 'perf record' and 'perf
top' and reuse the 'perf stat' uniquification, making the code more
globally visible for this.
Fix the detection of cross-PMU for disabling uniquify by correctly
setting last_pmu.
When setting uniquify on an evsel, make sure the PMUs between the 2
considered events differ otherwise the uniquify isn't adding value.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Chun-Tse Shao <ctshao@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Levi Yun <yeoreum.yun@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250513215401.2315949-2-ctshao@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
After elf_update_group_sh_info() was introduced, a prototype version of
"objtool klp diff" went from taking ~1s to several minutes, due to
looping almost endlessly in elf_update_group_sh_info() while creating
thousands of local symbols in a file with thousands of sections.
Dramatically improve the performance by marking all symbols' correlated
SHT_GROUP sections while reading the object. That way there's no need
to search for it every time a symbol gets reindexed.
Fixes: 2cb291596e2c ("objtool: Fix up st_info in COMDAT group section")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Rong Xu <xur@google.com>
Link: https://lkml.kernel.org/r/2a33e583c87e3283706f346f9d59aac20653b7fd.1746662991.git.jpoimboe@kernel.org
|
|
Patch series "eliminate mmap() retry merge, add .mmap_prepare hook", v2.
During the mmap() of a file-backed mapping, we invoke the underlying
driver file's mmap() callback in order to perform driver/file system
initialisation of the underlying VMA.
This has been a source of issues in the past, including a significant
security concern relating to unwinding of error state discovered by Jann
Horn, as fixed in commit 5de195060b2e ("mm: resolve faulty mmap_region()
error path behaviour") which performed the recent, significant, rework of
mmap() as a whole.
However, we have had a fly in the ointment remain - drivers have a great
deal of freedom in the .mmap() hook to manipulate VMA state (as well as
page table state).
This can be problematic, as we can no longer reason sensibly about VMA
state once the call is complete (the ability to do - anything - here does
rather interfere with that).
In addition, callers may choose to do odd or unusual things which might
interfere with subsequent steps in the mmap() process, and it may do so
and then raise an error, requiring very careful unwinding of state about
which we can make no assumptions.
Rather than providing such an open-ended interface, this series provides
an alternative, far more restrictive one - we expose a whitelist of fields
which can be adjusted by the driver, along with immutable state upon which
the driver can make such decisions:
struct vm_area_desc {
/* Immutable state. */
struct mm_struct *mm;
unsigned long start;
unsigned long end;
/* Mutable fields. Populated with initial state. */
pgoff_t pgoff;
struct file *file;
vm_flags_t vm_flags;
pgprot_t page_prot;
/* Write-only fields. */
const struct vm_operations_struct *vm_ops;
void *private_data;
};
The mmap logic then updates the state used to either merge with a VMA or
establish a new VMA based upon this logic.
This is achieved via new file hook .mmap_prepare(), which is, importantly,
invoked very early on in the mmap() process.
If an error arises, we can very simply abort the operation with very
little unwinding of state required.
The existing logic contains another, related, peccadillo - since the
.mmap() callback might do anything, it may also cause a previously
unmergeable VMA to become mergeable with adjacent VMAs.
Right now the logic will retry a merge like this only if the driver
changes VMA flags, and changes them in such a way that a merge might
succeed (that is, the flags are not 'special', that is do not contain any
of the flags specified in VM_SPECIAL).
This has also been the source of a great deal of pain - it's hard to
reason about an .mmap() callback that might do - anything - but it's also
hard to reason about setting up a VMA and writing to the maple tree, only
to do it again utilising a great deal of shared state.
Since .mmap_prepare() sets fields before the first merge is even
attempted, the use of this callback obviates the need for this retry merge
logic.
A driver may only specify .mmap_prepare() or the deprecated .mmap()
callback. In future we may add futher callbacks beyond .mmap_prepare() to
faciliate all use cass as we convert drivers.
In researching this change, I examined every .mmap() callback, and
discovered only a very few that set VMA state in such a way that a. the
VMA flags changed and b. this would be mergeable.
In the majority of cases, it turns out that drivers are mapping kernel
memory and thus ultimately set VM_PFNMAP, VM_MIXEDMAP, or other
unmergeable VM_SPECIAL flags.
Of those that remain I identified a number of cases which are only
applicable in DAX, setting the VM_HUGEPAGE flag:
* dax_mmap()
* erofs_file_mmap()
* ext4_file_mmap()
* xfs_file_mmap()
For this remerge to not occur and to impact users, each of these cases
would require a user to mmap() files using DAX, in parts, immediately
adjacent to one another.
This is a very unlikely usecase and so it does not appear to be worthwhile
to adjust this functionality accordingly.
We can, however, very quickly do so if needed by simply adding an
.mmap_prepare() callback to these as required.
There are two further non-DAX cases I idenitfied:
* orangefs_file_mmap() - Clears VM_RAND_READ if set, replacing with
VM_SEQ_READ.
* usb_stream_hwdep_mmap() - Sets VM_DONTDUMP.
Both of these cases again seem very unlikely to be mmap()'d immediately
adjacent to one another in a fashion that would result in a merge.
Finally, we are left with a viable case:
* secretmem_mmap() - Set VM_LOCKED, VM_DONTDUMP.
This is viable enough that the mm selftests trigger the logic as a matter
of course. Therefore, this series replace the .secretmem_mmap() hook with
.secret_mmap_prepare().
This patch (of 3):
Provide a means by which drivers can specify which fields of those
permitted to be changed should be altered to prior to mmap()'ing a range
(which may either result from a merge or from mapping an entirely new
VMA).
Doing so is substantially safer than the existing .mmap() calback which
provides unrestricted access to the part-constructed VMA and permits
drivers and file systems to do 'creative' things which makes it hard to
reason about the state of the VMA after the function returns.
The existing .mmap() callback's freedom has caused a great deal of issues,
especially in error handling, as unwinding the mmap() state has proven to
be non-trivial and caused significant issues in the past, for instance
those addressed in commit 5de195060b2e ("mm: resolve faulty mmap_region()
error path behaviour").
It also necessitates a second attempt at merge once the .mmap() callback
has completed, which has caused issues in the past, is awkward, adds
overhead and is difficult to reason about.
The .mmap_prepare() callback eliminates this requirement, as we can update
fields prior to even attempting the first merge. It is safer, as we
heavily restrict what can actually be modified, and being invoked very
early in the mmap() process, error handling can be performed safely with
very little unwinding of state required.
The .mmap_prepare() and deprecated .mmap() callbacks are mutually
exclusive, so we permit only one to be invoked at a time.
Update vma userland test stubs to account for changes.
Link: https://lkml.kernel.org/r/cover.1746792520.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/adb36a7c4affd7393b2fc4b54cc5cfe211e41f71.1746792520.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
test_memcg_protection()
The test_memcg_protection() function is used for the test_memcg_min and
test_memcg_low sub-tests. This function generates a set of parent/child
cgroups like:
parent: memory.min/low = 50M
child 0: memory.min/low = 75M, memory.current = 50M
child 1: memory.min/low = 25M, memory.current = 50M
child 2: memory.min/low = 0, memory.current = 50M
After applying memory pressure, the function expects the following actual
memory usages.
parent: memory.current ~= 50M
child 0: memory.current ~= 29M
child 1: memory.current ~= 21M
child 2: memory.current ~= 0
In reality, the actual memory usages can differ quite a bit from the
expected values. It uses an error tolerance of 10% with the
values_close() helper.
Both the test_memcg_min and test_memcg_low sub-tests can fail sporadically
because the actual memory usage exceeds the 10% error tolerance. Below
are a sample of the usage data of the tests runs that fail.
Child Actual usage Expected usage %err
----- ------------ -------------- ----
1 16990208 22020096 -12.9%
1 17252352 22020096 -12.1%
0 37699584 30408704 +10.7%
1 14368768 22020096 -21.0%
1 16871424 22020096 -13.2%
The current 10% error tolerenace might be right at the time
test_memcontrol.c was first introduced in v4.18 kernel, but memory reclaim
have certainly evolved quite a bit since then which may result in a bit
more run-to-run variation than previously expected.
Increase the error tolerance to 15% for child 0 and 20% for child 1 to
minimize the chance of this type of failure. The tolerance is bigger for
child 1 because an upswing in child 0 corresponds to a smaller %err than a
similar downswing in child 1 due to the way %err is used in
values_close().
Before this patch, a 100 test runs of test_memcontrol produced the
following results:
17 not ok 1 test_memcg_min
22 not ok 2 test_memcg_low
After applying this patch, there were no test failure for test_memcg_min
and test_memcg_low in 100 test runs. However, these tests may still fail
once in a while if the memory usage goes beyond the newly extended range.
Link: https://lkml.kernel.org/r/20250502010443.106022-3-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "memcg: Fix test_memcg_min/low test failures", v8.
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test (with memory_recursiveprot enabled) and sporadically fails its
test_memcg_min sub-test. This patchset fixes the test_memcg_min and
test_memcg_low failures by adjusting the test_memcontrol selftest to fix
these test failures.
This patch (of 8):
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that its 3rd test child cgroup which have a
memmory.low of 0 have low event count. This happens when
memory_recursiveprot mount option is enabled which is the default setting
used by systemd to mount cgroup2 filesystem.
This issue was originally fixed by commit cdc69458a5f3 ("cgroup: account
for memory_recursiveprot in test_memcg_low()"). It was later reverted by
commit 1d09069f5313 ("selftests: memcg: expect no low events in
unprotected sibling") expecting the memory reclaim code would be fixed.
However, it turns out the unprotected cgroup may still have some residual
effective memory.low protection depending on the memory.low settings in
its parent and its siblings. As a result, low events may still be
triggered.
One way to fix the test failure is to revert the revert commit. However,
Michal suggested that it might be better to ignore the low event count
with memory_recursiveprot enabled as low event may or may not happen
depending on the actual test configuration.
Modify the test_memcontrol.c to ignore low event in the 3rd child cgroup
with memory_recursiveprot on.
The 4th child cgroup has no memory usage and so has an effective low of 0.
It has no low event count because the mem_cgroup_below_low() check in
shrink_node_memcgs() is skipped as mem_cgroup_below_min() returns true.
If we ever change mem_cgroup_below_min() in such a way that it no longer
skips the no usage case, we will have to add code to explicitly skip it.
With this patch applied, the test_memcg_low sub-test finishes successfully
without failure in most cases. Though both test_memcg_low and
test_memcg_min sub-tests may still fail occasionally if the memory.current
values fall outside of the expected ranges.
Link: https://lkml.kernel.org/r/20250502010443.106022-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20250502010443.106022-2-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Improve the installation procedure for the systemd service unit
'cpupower.service', to be more distro-agnostic. Do not install the
service unit configuration file to /etc/default/ (a directory that
is used by Debian and Debian-derivatives and only rarely by other
distros).
Also, clarify the role of the configuration file in its own comments.
Link: https://lore.kernel.org/linux-pm/20250509002206.bd2519ba52035d47c3c32aa6@paranoici.org/T/#ma8a3fa80acc4036af6c754e8ecabacc55b288ad1
Link: https://lore.kernel.org/r/20250513163937.61062-5-invernomuto@paranoici.org
Fixes: 9c70b779ad91 ("cpupower: add a systemd service to run cpupower")
Signed-off-by: Francesco Poli (wintermute) <invernomuto@paranoici.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Fix the installation procedure for the systemd service unit
'cpupower.service'. Do not call "systemctl daemon-reload" in the
Makefile, but explain when this command should be manually issued
in the README file.
Link: https://lore.kernel.org/linux-pm/20250509002206.bd2519ba52035d47c3c32aa6@paranoici.org/T/#mfbb938f9c0d5a21173acb92a061eb9205fd0abfe
Link: https://lore.kernel.org/r/20250513163937.61062-4-invernomuto@paranoici.org
Fixes: 9c70b779ad91 ("cpupower: add a systemd service to run cpupower")
Signed-off-by: Francesco Poli (wintermute) <invernomuto@paranoici.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Fix the use of DESTDIR variable in the Makefile, as far as the
installation of the systemd service unit 'cpupower.service' is
concerned.
This was caused by a misunderstanding about the purpose of the DESTDIR
variable in the Makefile, which is instead meant to support staged
installations: its value should not end up into installed file contents.
Link: https://lore.kernel.org/linux-pm/20250509002206.bd2519ba52035d47c3c32aa6@paranoici.org/T/#mfbb938f9c0d5a21173acb92a061eb9205fd0abfe
Link: https://www.gnu.org/prep/standards/html_node/DESTDIR.html
Link: https://lore.kernel.org/r/20250513163937.61062-3-invernomuto@paranoici.org
Fixes: 9c70b779ad91 ("cpupower: add a systemd service to run cpupower")
Signed-off-by: Francesco Poli (wintermute) <invernomuto@paranoici.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
These tests:
"SOCK_STREAM ioctl(SIOCOUTQ) 0 unsent bytes"
"SOCK_SEQPACKET ioctl(SIOCOUTQ) 0 unsent bytes"
output: "Unexpected 'SIOCOUTQ' value, expected 0, got 64 (CLIENT)".
They test that the SIOCOUTQ ioctl reports 0 unsent bytes after the data
have been received by the other side. However, sometimes there is a delay
in updating this "unsent bytes" counter, and the test fails even though
the counter properly goes to 0 several milliseconds later.
The delay occurs in the kernel because the used buffer notification
callback virtio_vsock_tx_done(), called upon receipt of the data by the
other side, doesn't update the counter itself. It delegates that to
a kernel thread (via vsock->tx_work). Sometimes that thread is delayed
more than the test expects.
Change the test to poll SIOCOUTQ until it returns 0 or a timeout occurs.
Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Fixes: 18ee44ce97c1 ("test/vsock: add ioctl unsent bytes test")
Link: https://patch.msgid.link/20250507151456.2577061-1-kshk@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit ce6cb8113c84 ("tools: ynl-gen: individually free previous
values on double set"), specifying the "multi-attr" property raises an
error unless the "nested-attributes" property is specified as well:
File "tools/net/ynl/./pyynl/ynl_gen_c.py", line 1147, in _load_nested_sets
child = self.pure_nested_structs.get(nested)
^^^^^^
UnboundLocalError: cannot access local variable 'nested' where it is not associated with a value
This appears to be a bug since there are existing specs which omit
"nested-attributes" on "multi-attr" attributes. Also, according to
Documentation/userspace-api/netlink/specs.rst, multi-attr "is the
recommended way of implementing arrays (no extra nesting)", suggesting
that nesting should even be avoided in favor of multi-attr.
Fix the indentation of the if-block introduced by the commit to avoid
the error.
Fixes: ce6cb8113c84 ("tools: ynl-gen: individually free previous values on double set")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/d6b58684b7e5bfb628f7313e6893d0097904e1d1.1746940107.git.lukas@wunner.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new summary mode to collect stats for each cgroup.
$ sudo ./perf trace -as --bpf-summary --summary-mode=cgroup -- sleep 1
Summary of events:
cgroup /user.slice/user-657345.slice/user@657345.service/session.slice/org.gnome.Shell@x11.service, 535 events
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
ppoll 15 0 373.600 0.004 24.907 197.491 55.26%
poll 15 0 1.325 0.001 0.088 0.369 38.76%
close 66 0 0.567 0.007 0.009 0.026 3.55%
write 150 0 0.471 0.001 0.003 0.010 3.29%
recvmsg 94 83 0.290 0.000 0.003 0.037 16.39%
ioctl 26 0 0.237 0.001 0.009 0.096 50.13%
timerfd_create 66 0 0.236 0.003 0.004 0.024 8.92%
timerfd_settime 70 0 0.160 0.001 0.002 0.012 7.66%
writev 10 0 0.118 0.001 0.012 0.019 18.17%
read 9 0 0.021 0.001 0.002 0.004 14.07%
getpid 14 0 0.019 0.000 0.001 0.004 20.28%
cgroup /system.slice/polkit.service, 94 events
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
ppoll 22 0 19.811 0.000 0.900 9.273 63.88%
write 30 0 0.040 0.001 0.001 0.003 12.09%
recvmsg 12 0 0.018 0.001 0.002 0.006 28.15%
read 18 0 0.013 0.000 0.001 0.003 21.99%
poll 12 0 0.006 0.000 0.001 0.001 4.48%
cgroup /user.slice/user-657345.slice/user@657345.service/app.slice/app-org.gnome.Terminal.slice/gnome-terminal-server.service, 21 events
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
ppoll 4 0 17.476 0.003 4.369 13.298 69.65%
recvmsg 15 12 0.068 0.002 0.005 0.014 26.53%
writev 1 0 0.033 0.033 0.033 0.033 0.00%
poll 1 0 0.005 0.005 0.005 0.005 0.00%
...
It works only for --bpf-summary for now.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250501225337.928470-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Sometimes we need to analyze the data in process level but current sort
keys only work on thread level. Let's add 'tgid' sort key for that as
'pid' is already taken for thread.
This will look mostly the same, but it only uses tgid instead of tid.
Here's an example of a process with two threads (thloop).
$ perf record -- perf test -w thloop
$ perf report --stdio -s tgid,pid -H
...
#
# Overhead Tgid:Command / Pid:Command
# ........... ..........................
#
100.00% 2018407:perf
50.34% 2018407:perf
49.66% 2018409:perf
Suggested-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250509210421.197245-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
While CPU is a system device, it'd be better to use a path for
event_source devices when it checks PMU capability.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250509213017.204343-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
I found 'perf record LBR tests' failing due to empty branch stacks.
$ perf test -v LBR
...
LBR system wide any branch test
Lowering default frequency rate from 4000 to 1000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 8 times to write data ]
[ perf record: Captured and wrote 3.142 MB /tmp/__perf_test.perf.data.dgSBl (3572 samples) ]
LBR system wide any branch test: 3572 samples
LBR system wide any branch test [Failed empty br stack ratio exceed 2%: 3%]
LBR system wide any call test
Lowering default frequency rate from 4000 to 1000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 8 times to write data ]
[ perf record: Captured and wrote 3.337 MB /tmp/__perf_test.perf.data.dgSBl (3967 samples) ]
LBR system wide any call test: 3967 samples
LBR system wide any call test [Failed empty br stack ratio exceed 2%: 9%]
...
The failing cases were in system-wide mode and I realized that the
samples were from the idle tasks (swapper). I suspect going to/from
idle state may affect the LBR contents.
If we can skip empty branch stacks from the idle tasks, the failure
should go away. I can see the following output in perf report -D.
$ perf report -D | grep -m5 -A3 'branch stack: nr:0'
...
--
... branch stack: nr:0
... thread: swapper:0
...... dso: /proc/kcore
--
... branch stack: nr:0
... thread: swapper:0
...... dso: /proc/kcore
--
... branch stack: nr:0
... thread: DefaultEventMan:10282
...... dso: /proc/kcore
--
... branch stack: nr:0
... thread: swapper:0
...... dso: /proc/kcore
--
... branch stack: nr:0
... thread: swapper:0
...... dso: /proc/kcore
$ perf report -D | grep -c 'branch stack: nr:0'
145
$ perf report -D | grep -A3 'branch stack: nr:0' | grep thread | grep -c swapper
i36
$ perf report -D | grep -A3 'branch stack: nr:0' | grep thread | grep -cv swapper
9
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250509213017.204343-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Syscall tables are generated from rules in the kernel tree. Add the
related files to the MANIFEST to fix the Perf source package build.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes: bfb713ea53c746b0 ("perf tools: Fix arm64 build by generating unistd_64.h")
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250513-james-perf-src-pkg-fix-v1-1-bcfd0486dbd6@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
On my alderlake I currently see for the "perf metrics value validation" test:
```
Total Test Count: 142
Passed Test Count: 139
[
Metric Relationship Error: The collected value of metric ['tma_fetch_latency', 'tma_fetch_bandwidth', 'tma_frontend_bound']
is [31.137028] in workload(s): ['perf bench futex hash -r 2 -s']
but expected value range is [tma_frontend_bound, tma_frontend_bound]
Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
Metric Relationship Error: The collected value of metric ['tma_memory_bound', 'tma_core_bound', 'tma_backend_bound']
is [6.564442] in workload(s): ['perf bench futex hash -r 2 -s']
but expected value range is [tma_backend_bound, tma_backend_bound]
Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
Metric Relationship Error: The collected value of metric ['tma_light_operations', 'tma_heavy_operations', 'tma_retiring']
is [57.806179] in workload(s): ['perf bench futex hash -r 2 -s']
but expected value range is [tma_retiring, tma_retiring]
Relationship rule description: 'Sum of the level 2 children should equal level 1 parent']
Metric validation return with erros. Please check metrics reported with errors.
```
I suspect it is due to two metrics for different CPU types being
enabled. Add a -cputype option to avoid this. The test still fails with:
```
Total Test Count: 115
Passed Test Count: 114
[
Wrong Metric Value Error: The collected value of metric ['tma_l2_hit_latency']
is [117.947088] in workload(s): ['perf bench futex hash -r 2 -s']
but expected value range is [0, 100]]
Metric validation return with errors. Please check metrics reported with errors.
```
which is a reproducible genuine error and likely requires a metric fix.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250512184700.11691-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The 'perf stat --cputype' option can be used to filter which metrics
will be applied, for this reason the JSON metrics have an associated
PMU.
List this PMU name in the 'perf list' output in JSON mode so that
tooling may access it.
An example of the new field is:
```
{
"MetricGroup": "Backend",
"MetricName": "tma_core_bound",
"MetricExpr": "max(0, tma_backend_bound - tma_memory_bound)",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
"ScaleUnit": "100%",
"BriefDescription": "This metric represents fraction of slots where ...
"PublicDescription": "This metric represents fraction of slots where ...
"Unit": "cpu_core"
},
```
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250512184700.11691-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Unlike with events, metrics can be matched by name or a list of metric
groups.
However, when a metric refers to another metric it isn't referring to a
group but the singular metric in question.
Prior to this change every "id" in a metric expression is checked to see
if it is a metric by scanning all the metrics in the metrics table.
As the table is sorted my metric name we can speed the search in the
resolution case by binary searching for the metric.
Rename some of the metricgroup functions to make it clearer whether
they match a metric by name or by both name and group.
Before:
```
$ time perf test -v 10
10: PMU JSON event tests :
10.1: PMU event table sanity : Ok
10.2: PMU event map aliases : Ok
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
10.5: Parsing of metric thresholds with fake PMUs : Ok
real 0m15.972s
user 0m13.176s
sys 0m3.001s
```
After:
```
$ time perf test -v 10
10: PMU JSON event tests :
10.1: PMU event table sanity : Ok
10.2: PMU event map aliases : Ok
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
10.5: Parsing of metric thresholds with fake PMUs : Ok
real 0m5.343s
user 0m1.871s
sys 0m2.128s
```
Committer testing:
root@number:~# grep -m1 'model name' /proc/cpuinfo
model name : AMD Ryzen 9 9950X3D 16-Core Processor
root@number:~#
Before:
root@number:~# time perf test "Parsing of PMU event table metrics"
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
real 0m9.286s
user 0m9.354s
sys 0m0.062s
root@number:~#
After:
root@number:~# time perf test "Parsing of PMU event table metrics"
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
real 0m0.689s
user 0m0.766s
sys 0m0.042s
root@number:~# time perf test 10
10: PMU JSON event tests :
10.1: PMU event table sanity : Ok
10.2: PMU event map aliases : Ok
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
10.5: Parsing of metric thresholds with fake PMUs : Ok
real 0m0.696s
user 0m0.807s
sys 0m0.064s
root@number:~#
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20250512194622.33258-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Finding an alias for things like perf_pmu__have_event() would need to
search the aliases list, whilst this happens relatively infrequently it
can be a significant overhead in testing.
Switch to using a hashmap. Move common initialization code to
perf_pmu__init(). Refactor the test 'struct perf_pmu_test_pmu' to not
have perf pmu within it to better support the perf_pmu__init() function.
Before:
```
$ time perf test "Parsing of PMU event table metrics"
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
real 0m13.287s
user 0m13.026s
sys 0m0.532s
```
After:
```
$ time perf test "Parsing of PMU event table metrics"
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
real 0m13.011s
user 0m12.885s
sys 0m0.485s
```
Committer testing:
root@number:~# grep -m1 'model name' /proc/cpuinfo
model name : AMD Ryzen 9 9950X3D 16-Core Processor
root@number:~#
Before:
root@number:~# time perf test "Parsing of PMU event table metrics"
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
real 0m9.296s
user 0m9.361s
sys 0m0.063s
root@number:~#
After:
root@number:~# time perf test "Parsing of PMU event table metrics"
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
real 0m9.286s
user 0m9.354s
sys 0m0.062s
root@number:~#
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20250512194622.33258-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The existing fncache can get large in testing situations. As the
bucket array is a fixed size this leads to it degrading to O(n)
performance. Use a regular hashmap that can dynamically reallocate its
array.
Before:
```
$ time perf test "Parsing of PMU event table metrics"
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
real 0m14.132s
user 0m17.806s
sys 0m0.557s
```
After:
```
$ time perf test "Parsing of PMU event table metrics"
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
real 0m13.287s
user 0m13.026s
sys 0m0.532s
```
Committer notes:
root@number:~# grep -m1 'model name' /proc/cpuinfo
model name : AMD Ryzen 9 9950X3D 16-Core Processor
root@number:~#
Before:
root@number:~# time perf test "Parsing of PMU event table metrics"
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
real 0m9.277s
user 0m9.979s
sys 0m0.055s
root@number:~#
After:
root@number:~# time perf test "Parsing of PMU event table metrics"
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
real 0m9.296s
user 0m9.361s
sys 0m0.063s
root@number:~#
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20250512194622.33258-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Support using a struct pointer for binary attrs. Len field is maintained
because the structs may grow with newer kernel versions. Or, which matters
more, be shorter if the binary is built against newer uAPI than kernel
against which it's executed. Since we are storing a pointer to a struct
type - always allocate at least the amount of memory needed by the struct
per current uAPI headers (unused mem is zeroed). Technically users should
check the length field but per modern ASAN checks storing a short object
under a pointer seems like a bad idea.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250509154213.1747885-4-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We auto-indent if statements (increase the indent of the subsequent
line by 1), do the same thing for else branches without a block.
There hasn't been any else branches before but we're about to add one.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250509154213.1747885-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Sub-type annotation on binary attributes may indicate that the attribute
carries an array of simple types (also referred to as "C array" in docs).
Support rendering them as such in the C user code. For example for u32,
instead of:
struct {
u32 arr;
} _len;
void *arr;
render:
struct {
u32 arr;
} _count;
__u32 *arr;
Note that count is the number of elements while len was the length in bytes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250509154213.1747885-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add support for devmem TX in ncdevmem.
This is a combination of the ncdevmem from the devmem TCP series RFCv1
which included the TX path, and work by Stan to include the netlink API
and refactored on top of his generic memory_provider support.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-10-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add bind-tx netlink call to attach dmabuf for TX; queue is not
required, only ifindex and dmabuf fd for attachment.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250508004830.4100853-4-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Conflicts:
Documentation/admin-guide/hw-vuln/index.rst
arch/x86/include/asm/cpufeatures.h
arch/x86/kernel/alternative.c
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/cpu/common.c
drivers/base/cpu.c
include/linux/cpu.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
arch/x86/mm/numa.c
arch/x86/mm/pgtable.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Prepare to resolve conflicts with an upstream series of fixes that conflict
with pending x86 changes:
6f5bf947bab0 Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
arch/x86/kernel/cpu/bugs.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Prepare to resolve conflicts with an upstream series of fixes that conflict
with pending x86 changes:
6f5bf947bab0 Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|