Age | Commit message (Collapse) | Author |
|
Utilize bpf_modify_return_test_tp() kfunc to have a fast way to trigger
tp/raw_tp/fmodret programs from another BPF program, which gives us
comparable batched benchmarks to (batched) kprobe/fentry benchmarks.
We don't switch kprobe/fentry batched benchmarks to this kfunc to make
bench tool usable on older kernels as well.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Instead of front-loading all possible benchmarking BPF programs for
trigger benchmarks, explicitly specify which BPF programs are used by
specific benchmark and load only it.
This allows to be more flexible in supporting older kernels, where some
program types might not be possible to load (e.g., those that rely on
newly added kfunc).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Remove "legacy" benchmarks triggered by syscalls in favor of newly added
in-kernel/batched benchmarks. Drop -batched suffix now as well.
Next patch will restore "feature parity" by adding back
tp/raw_tp/fmodret benchmarks based on in-kernel kfunc approach.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Existing kprobe/fentry triggering benchmarks have 1-to-1 mapping between
one syscall execution and BPF program run. While we use a fast
get_pgid() syscall, syscall overhead can still be non-trivial.
This patch adds kprobe/fentry set of benchmarks significantly amortizing
the cost of syscall vs actual BPF triggering overhead. We do this by
employing BPF_PROG_TEST_RUN command to trigger "driver" raw_tp program
which does a tight parameterized loop calling cheap BPF helper
(bpf_get_numa_node_id()), to which kprobe/fentry programs are
attached for benchmarking.
This way 1 bpf() syscall causes N executions of BPF program being
benchmarked. N defaults to 100, but can be adjusted with
--trig-batch-iters CLI argument.
For comparison we also implement a new baseline program that instead of
triggering another BPF program just does N atomic per-CPU counter
increments, establishing the limit for all other types of program within
this batched benchmarking setup.
Taking the final set of benchmarks added in this patch set (including
tp/raw_tp/fmodret, added in later patch), and keeping for now "legacy"
syscall-driven benchmarks, we can capture all triggering benchmarks in
one place for comparison, before we remove the legacy ones (and rename
xxx-batched into just xxx).
$ benchs/run_bench_trigger.sh
usermode-count : 79.500 ± 0.024M/s
kernel-count : 49.949 ± 0.081M/s
syscall-count : 9.009 ± 0.007M/s
fentry-batch : 31.002 ± 0.015M/s
fexit-batch : 20.372 ± 0.028M/s
fmodret-batch : 21.651 ± 0.659M/s
rawtp-batch : 36.775 ± 0.264M/s
tp-batch : 19.411 ± 0.248M/s
kprobe-batch : 12.949 ± 0.220M/s
kprobe-multi-batch : 15.400 ± 0.007M/s
kretprobe-batch : 5.559 ± 0.011M/s
kretprobe-multi-batch: 5.861 ± 0.003M/s
fentry-legacy : 8.329 ± 0.004M/s
fexit-legacy : 6.239 ± 0.003M/s
fmodret-legacy : 6.595 ± 0.001M/s
rawtp-legacy : 8.305 ± 0.004M/s
tp-legacy : 6.382 ± 0.001M/s
kprobe-legacy : 5.528 ± 0.003M/s
kprobe-multi-legacy : 5.864 ± 0.022M/s
kretprobe-legacy : 3.081 ± 0.001M/s
kretprobe-multi-legacy: 3.193 ± 0.001M/s
Note how xxx-batch variants are measured with significantly higher
throughput, even though it's exactly the same in-kernel overhead. As
such, results can be compared only between benchmarks of the same kind
(syscall vs batched):
fentry-legacy : 8.329 ± 0.004M/s
fentry-batch : 31.002 ± 0.015M/s
kprobe-multi-legacy : 5.864 ± 0.022M/s
kprobe-multi-batch : 15.400 ± 0.007M/s
Note also that syscall-count is setting a theoretical limit for
syscall-triggered benchmarks, while kernel-count is setting similar
limits for batch variants. usermode-count is a happy and unachievable
case of user space counting without doing any syscalls, and is mostly
the measure of CPU speed for such a trivial benchmark.
As was mentioned, tp/raw_tp/fmodret require kernel-side kfunc to produce
similar benchmark, which we address in a separate patch.
Note that run_bench_trigger.sh allows to override a list of benchmarks
to run, which is very useful for performance work.
Cc: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Rename uprobe-base to more precise usermode-count (it will match other
baseline-like benchmarks, kernel-count and syscall-count). Also use
BENCH_TRIG_USERMODE() macro to define all usermode-based triggering
benchmarks, which include usermode-count and uprobe/uretprobe benchmarks.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
BPF verifier emits "unknown func" message when given BPF program type
does not support BPF helper. This message may be confusing for users, as
important context that helper is unknown only to current program type is
not provided.
This patch changes message to "program of this type cannot use helper "
and aligns dependent code in libbpf and tests. Any suggestions on
improving/changing this message are welcome.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20240325152210.377548-1-yatsenko@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch extends the fib_lookup test suite by adding a few test
cases for each IP family to test the new BPF_FIB_LOOKUP_MARK flag
to the bpf_fib_lookup:
* Test destination IP address selection with and without a mark
and/or the BPF_FIB_LOOKUP_MARK flag set
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240326101742.17421-3-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts, or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a missing check to bloom filter creating, rejecting
values above KMALLOC_MAX_SIZE. This brings the bloom map in line with
many other map types.
The lack of this protection can cause kernel crashes for value sizes
that overflow int's. Such a crash was caught by syzkaller. The next
patch adds more guard-rails at a lower level.
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240327024245.318299-2-andreimatei1@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Recently, I frequently hit the following test failure:
[root@arch-fb-vm1 bpf]# ./test_progs -n 33/1
test_lookup_update:PASS:skel_open 0 nsec
[...]
test_lookup_update:PASS:sync_rcu 0 nsec
test_lookup_update:FAIL:map1_leak inner_map1 leaked!
#33/1 btf_map_in_map/lookup_update:FAIL
#33 btf_map_in_map:FAIL
In the test, after map is closed and then after two rcu grace periods,
it is assumed that map_id is not available to user space.
But the above assumption cannot be guaranteed. After zero or one
or two rcu grace periods in different siturations, the actual
freeing-map-work is put into a workqueue. Later on, when the work
is dequeued, the map will be actually freed.
See bpf_map_put() in kernel/bpf/syscall.c.
By using workqueue, there is no ganrantee that map will be actually
freed after a couple of rcu grace periods. This patch removed
such map leak detection and then the test can pass consistently.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240322061353.632136-1-yonghong.song@linux.dev
|
|
To simplify the code, use BPF selftests helper start_server() in
bpf_tcp_ca.c instead of open-coding it. This helper is defined in
network_helpers.c, and exported in network_helpers.h, which is already
included in bpf_tcp_ca.c.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/9926a79118db27dd6d91c4854db011c599cabd0e.1711331517.git.tanggeliang@kylinos.cn
|
|
The arena_list selftest uses (1ull << 32) in the mmap address
computation for arm64. Use the same in the verifier_arena selftest.
This makes the selftest pass for arm64 on the CI[1].
[1] https://github.com/kernel-patches/bpf/pull/6622
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Link: https://lore.kernel.org/r/20240322133552.70681-1-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Some distros seem to enable the -fcf-protection=branch by default,
which breaks our setup on first instruction of uprobe trigger
functions and place there endbr64 instruction.
Marking them with nocf_check attribute to skip that.
Ignoring unknown attribute warning in gcc for bench objects, because
nocf_check can be used only when -fcf-protection=branch is enabled,
otherwise we get a warning and break compilation.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240322134936.1075395-1-jolsa@kernel.org
|
|
With glibc 2.28, selftests compilation fails for benchs/bench_trigger.c:
benchs/bench_trigger.c: In function ‘inc_counter’:
benchs/bench_trigger.c:25:23: error: implicit declaration of function ‘gettid’; did you mean ‘getgid’? [-Werror=implicit-function-declaration]
25 | tid = gettid();
| ^~~~~~
| getgid
cc1: all warnings being treated as errors
It appears support for the gettid() wrapper is variable across glibc
versions, so may be safer to use syscall(SYS_gettid) instead.
Fixes: 520fad2e3206 ("selftests/bpf: scale benchmark counting by using per-CPU counters")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240322095728.95671-1-alan.maguire@oracle.com
|
|
When benchmarking with multiple threads (-pN, where N>1), we start
contending on single atomic counter that both BPF trigger benchmarks are
using, as well as "baseline" tests in user space (trig-base and
trig-uprobe-base benchmarks). As such, we start bottlenecking on
something completely irrelevant to benchmark at hand.
Scale counting up by using per-CPU counters on BPF side. On use space
side we do the next best thing: hash thread ID to approximate per-CPU
behavior. It seems to work quite well in practice.
To demonstrate the difference, I ran three benchmarks with 1, 2, 4, 8,
16, and 32 threads:
- trig-uprobe-base (no syscalls, pure tight counting loop in user-space);
- trig-base (get_pgid() syscall, atomic counter in user-space);
- trig-fentry (syscall to trigger fentry program, atomic uncontended per-CPU
counter on BPF side).
Command used:
for b in uprobe-base base fentry; do \
for p in 1 2 4 8 16 32; do \
printf "%-11s %2d: %s\n" $b $p \
"$(sudo ./bench -w2 -d5 -a -p$p trig-$b | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)"; \
done; \
done
Before these changes, aggregate throughput across all threads doesn't
scale well with number of threads, it actually even falls sharply for
uprobe-base due to a very high contention:
uprobe-base 1: 138.998 ± 0.650M/s
uprobe-base 2: 70.526 ± 1.147M/s
uprobe-base 4: 63.114 ± 0.302M/s
uprobe-base 8: 54.177 ± 0.138M/s
uprobe-base 16: 45.439 ± 0.057M/s
uprobe-base 32: 37.163 ± 0.242M/s
base 1: 16.940 ± 0.182M/s
base 2: 19.231 ± 0.105M/s
base 4: 21.479 ± 0.038M/s
base 8: 23.030 ± 0.037M/s
base 16: 22.034 ± 0.004M/s
base 32: 18.152 ± 0.013M/s
fentry 1: 14.794 ± 0.054M/s
fentry 2: 17.341 ± 0.055M/s
fentry 4: 23.792 ± 0.024M/s
fentry 8: 21.557 ± 0.047M/s
fentry 16: 21.121 ± 0.004M/s
fentry 32: 17.067 ± 0.023M/s
After these changes, we see almost perfect linear scaling, as expected.
The sub-linear scaling when going from 8 to 16 threads is interesting
and consistent on my test machine, but I haven't investigated what is
causing it this peculiar slowdown (across all benchmarks, could be due
to hyperthreading effects, not sure).
uprobe-base 1: 139.980 ± 0.648M/s
uprobe-base 2: 270.244 ± 0.379M/s
uprobe-base 4: 532.044 ± 1.519M/s
uprobe-base 8: 1004.571 ± 3.174M/s
uprobe-base 16: 1720.098 ± 0.744M/s
uprobe-base 32: 3506.659 ± 8.549M/s
base 1: 16.869 ± 0.071M/s
base 2: 33.007 ± 0.092M/s
base 4: 64.670 ± 0.203M/s
base 8: 121.969 ± 0.210M/s
base 16: 207.832 ± 0.112M/s
base 32: 424.227 ± 1.477M/s
fentry 1: 14.777 ± 0.087M/s
fentry 2: 28.575 ± 0.146M/s
fentry 4: 56.234 ± 0.176M/s
fentry 8: 106.095 ± 0.385M/s
fentry 16: 181.440 ± 0.032M/s
fentry 32: 369.131 ± 0.693M/s
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240315213329.1161589-1-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add test validating BPF cookie can be passed during raw_tp/tp_btf
attachment and can be retried at runtime with bpf_get_attach_cookie()
helper.
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-6-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gmail.com
|
|
Add a sk_msg bpf program test where the program is running in a pid
namespace. The test is successful:
#165/4 ns_current_pid_tgid/new_ns_sk_msg:OK
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240315184915.2976718-1-yonghong.song@linux.dev
|
|
Add a cgroup bpf program test where the bpf program is running
in a pid namespace. The test is successfully:
#165/3 ns_current_pid_tgid/new_ns_cgrp:OK
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240315184910.2976522-1-yonghong.song@linux.dev
|
|
Refactor some functions in both user space code and bpf program
as these functions are used by later cgroup/sk_msg tests.
Another change is to mark tp program optional loading as later
patches will use optional loading as well since they have quite
different attachment and testing logic.
There is no functionality change.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240315184904.2976123-1-yonghong.song@linux.dev
|
|
Replace CHECK in selftest ns_current_pid_tgid with recommended ASSERT_* style.
I also shortened subtest name as the prefix of subtest name is covered
by the test name already.
This patch does fix a testing issue. Currently even if bss->user_{pid,tgid}
is not correct, the test still passed since the clone func returns 0.
I fixed it to return a non-zero value if bss->user_{pid,tgid} is incorrect.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240315184859.2975543-1-yonghong.song@linux.dev
|
|
Check that 4Gbyte arena can be allocated and overflow/underflow access in
the first and the last page behaves as expected.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20240315021834.62988-5-alexei.starovoitov@gmail.com
|
|
Remove hard coded PAGE_SIZE.
Add #include <sys/user.h> instead (that works on x86-64 and s390)
and fallback to slow getpagesize() for aarch64.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20240315021834.62988-4-alexei.starovoitov@gmail.com
|
|
The selftests use
to tell LLVM about special pointers. For LLVM there is nothing "arena"
about them. They are simply pointers in a different address space.
Hence LLVM diff https://github.com/llvm/llvm-project/pull/85161 renamed:
. macro __BPF_FEATURE_ARENA_CAST -> __BPF_FEATURE_ADDR_SPACE_CAST
. global variables in __attribute__((address_space(N))) are now
placed in section named ".addr_space.N" instead of ".arena.N".
Adjust libbpf, bpftool, and selftests to match LLVM.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20240315021834.62988-3-alexei.starovoitov@gmail.com
|
|
There are statements with two semicolons. Remove the second one, it
is redundant.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240315092654.2431062-1-colin.i.king@gmail.com
|
|
A new version of a type may have additional fields that do not exist in
older versions. Previously, libbpf would reject struct_ops maps with a new
version containing extra fields when running on a machine with an old
kernel. However, we have updated libbpf to ignore these fields if their
values are all zeros or null in order to provide backward compatibility.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240313214139.685112-3-thinker.li@gmail.com
|
|
According to a report, skeletons fail to assign shadow pointers when being
compiled with C++ programs. Unlike C doing implicit casting for void
pointers, C++ requires an explicit casting.
To support C++, we do explicit casting for each shadow pointer.
Also add struct_ops_module.skel.h to test_cpp to validate C++
compilation as part of BPF selftests.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240312013726.1780720-1-thinker.li@gmail.com
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2024-03-11
We've added 59 non-merge commits during the last 9 day(s) which contain
a total of 88 files changed, 4181 insertions(+), 590 deletions(-).
The main changes are:
1) Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce
VM_SPARSE kind and vm_area_[un]map_pages to be used in bpf_arena,
from Alexei.
2) Introduce bpf_arena which is sparse shared memory region between bpf
program and user space where structures inside the arena can have
pointers to other areas of the arena, and pointers work seamlessly for
both user-space programs and bpf programs, from Alexei and Andrii.
3) Introduce may_goto instruction that is a contract between the verifier
and the program. The verifier allows the program to loop assuming it's
behaving well, but reserves the right to terminate it, from Alexei.
4) Use IETF format for field definitions in the BPF standard
document, from Dave.
5) Extend struct_ops libbpf APIs to allow specify version suffixes for
stuct_ops map types, share the same BPF program between several map
definitions, and other improvements, from Eduard.
6) Enable struct_ops support for more than one page in trampolines,
from Kui-Feng.
7) Support kCFI + BPF on riscv64, from Puranjay.
8) Use bpf_prog_pack for arm64 bpf trampoline, from Puranjay.
9) Fix roundup_pow_of_two undefined behavior on 32-bit archs, from Toke.
====================
Link: https://lore.kernel.org/r/20240312003646.8692-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adding kprobe multi triggering benchmarks. It's useful now to bench
new fprobe implementation and might be useful later as well.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240311211023.590321-1-jolsa@kernel.org
|
|
bpf_arena_htab.h - hash table implemented as bpf program
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-15-alexei.starovoitov@gmail.com
|
|
bpf_arena_alloc.h - implements page_frag allocator as a bpf program.
bpf_arena_list.h - doubly linked link list as a bpf program.
Compiled as a bpf program and as native C code.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-14-alexei.starovoitov@gmail.com
|
|
Add unit tests for bpf_arena_alloc/free_pages() functionality
and bpf_arena_common.h with a set of common helpers and macros that
is used in this test and the following patches.
Also modify test_loader that didn't support running bpf_prog_type_syscall
programs.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-13-alexei.starovoitov@gmail.com
|
|
Introduce helper macro bpf_addr_space_cast() that emits:
rX = rX
instruction with off = BPF_ADDR_SPACE_CAST
and encodes dest and src address_space-s into imm32.
It's useful with older LLVM that doesn't emit this insn automatically.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-12-alexei.starovoitov@gmail.com
|
|
We already have kprobe and fentry benchmarks. Let's add kretprobe and
fexit ones for completeness.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240309005124.3004446-1-andrii@kernel.org
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
net/core/page_pool_user.c
0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors")
429679dcf7d9 ("page_pool: fix netlink dump stop/resume")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Two test cases to verify that '?' and other printable characters are
allowed in BTF DATASEC names:
- DATASEC with name "?.foo bar:buz" should be accepted;
- type with name "?foo" should be rejected.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-16-eddyz87@gmail.com
|
|
Check that "?.struct_ops" and "?.struct_ops.link" section names define
struct_ops maps with autocreate == false after open.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-14-eddyz87@gmail.com
|
|
Check that autocreate flags of struct_ops map cause autoload of
struct_ops corresponding programs:
- when struct_ops program is referenced only from a map for which
autocreate is set to false, that program should not be loaded;
- when struct_ops program with autoload == false is set to be used
from a map with autocreate == true using shadow var,
that program should be loaded;
- when struct_ops program is not referenced from any map object load
should fail.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-10-eddyz87@gmail.com
|
|
Check that bpf_map__set_autocreate() can be used to disable automatic
creation for struct_ops maps.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-8-eddyz87@gmail.com
|
|
When loading struct_ops programs kernel requires BTF id of the
struct_ops type and member index for attachment point inside that
type. This makes impossible to use same BPF program in several
struct_ops maps that have different struct_ops type.
Check if libbpf rejects such BPF objects files.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-7-eddyz87@gmail.com
|
|
Several test_progs tests already capture libbpf log in order to check
for some expected output, e.g bpf_tcp_ca.c, kfunc_dynptr_param.c,
log_buf.c and a few others.
This commit provides a, hopefully, simple API to capture libbpf log
w/o necessity to define new print callback in each test:
/* Creates a global memstream capturing INFO and WARN level output
* passed to libbpf_print_fn.
* Returns 0 on success, negative value on failure.
* On failure the description is printed using PRINT_FAIL and
* current test case is marked as fail.
*/
int start_libbpf_log_capture(void)
/* Destroys global memstream created by start_libbpf_log_capture().
* Returns a pointer to captured data which has to be freed.
* Returned buffer is null terminated.
*/
char *stop_libbpf_log_capture(void)
The intended usage is as follows:
if (start_libbpf_log_capture())
return;
use_libbpf();
char *log = stop_libbpf_log_capture();
ASSERT_HAS_SUBSTR(log, "... expected ...", "expected some message");
free(log);
As a safety measure, free(start_libbpf_log_capture()) is invoked in the
epilogue of the test_progs.c:run_one_test().
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-6-eddyz87@gmail.com
|
|
Extend struct_ops_module test case to check if it is possible to use
'___' suffixes for struct_ops type specification.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20240306104529.6453-5-eddyz87@gmail.com
|
|
Add tests for may_goto instruction via cond_break macro.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240306031929.42666-5-alexei.starovoitov@gmail.com
|
|
Use may_goto instruction to implement cond_break macro.
Ideally the macro should be written as:
asm volatile goto(".byte 0xe5;
.byte 0;
.short %l[l_break] ...
.long 0;
but LLVM doesn't recognize fixup of 2 byte PC relative yet.
Hence use
asm volatile goto(".byte 0xe5;
.byte 0;
.long %l[l_break] ...
.short 0;
that produces correct asm on little endian.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240306031929.42666-4-alexei.starovoitov@gmail.com
|
|
Adjust the XDP feature flags for the bond device when no bond slave
devices are attached. After 9b0ed890ac2a ("bonding: do not report
NETDEV_XDP_ACT_XSK_ZEROCOPY"), the empty bond device must report 0
as flags instead of NETDEV_XDP_ACT_MASK.
# ./vmtest.sh -- ./test_progs -t xdp_bond
[...]
[ 3.983311] bond1 (unregistering): (slave veth1_1): Releasing backup interface
[ 3.995434] bond1 (unregistering): Released all slaves
[ 4.022311] bond2: (slave veth2_1): Releasing backup interface
#507/1 xdp_bonding/xdp_bonding_attach:OK
#507/2 xdp_bonding/xdp_bonding_nested:OK
#507/3 xdp_bonding/xdp_bonding_features:OK
#507/4 xdp_bonding/xdp_bonding_roundrobin:OK
#507/5 xdp_bonding/xdp_bonding_activebackup:OK
#507/6 xdp_bonding/xdp_bonding_xor_layer2:OK
#507/7 xdp_bonding/xdp_bonding_xor_layer23:OK
#507/8 xdp_bonding/xdp_bonding_xor_layer34:OK
#507/9 xdp_bonding/xdp_bonding_redirect_multi:OK
#507 xdp_bonding:OK
Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED
[ 4.185255] bond2 (unregistering): Released all slaves
[...]
Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Message-ID: <20240305090829.17131-2-daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The test case was minimized from mailing list discussion [0].
It is equivalent to the following C program:
struct iter_limit_bug_ctx { __u64 a; __u64 b; __u64 c; };
static __naked void iter_limit_bug_cb(void)
{
switch (bpf_get_prandom_u32()) {
case 1: ctx->a = 42; break;
case 2: ctx->b = 42; break;
default: ctx->c = 42; break;
}
}
int iter_limit_bug(struct __sk_buff *skb)
{
struct iter_limit_bug_ctx ctx = { 7, 7, 7 };
bpf_loop(2, iter_limit_bug_cb, &ctx, 0);
if (ctx.a == 42 && ctx.b == 42 && ctx.c == 7)
asm volatile("r1 /= 0;":::"r1");
return 0;
}
The main idea is that each loop iteration changes one of the state
variables in a non-deterministic manner. Hence it is premature to
prune the states that have two iterations left comparing them to
states with one iteration left.
E.g. {{7,7,7}, callback_depth=0} can reach state {42,42,7},
while {{7,7,7}, callback_depth=1} can't.
[0] https://lore.kernel.org/bpf/9b251840-7cb8-4d17-bd23-1fc8071d8eef@linux.dev/
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240222154121.6991-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Create and load a struct_ops map with a large number of struct_ops
programs to generate trampolines taking a size over multiple pages. The
map includes 40 programs. Their trampolines takes 6.6k+, more than 1.5
pages, on x86.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240224223418.526631-4-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
In current ping-pong design, xdp_hw_metadata will wait until the packet
transmission completely done, then only start to receive the next packet.
The current sleep interval is 10ms, which is unnecessary large. Typically,
a NIC does not need such a long time to transmit a packet. Furthermore,
during this 10ms sleep time, the app is unable to receive incoming packets.
Therefore, this commit reduce sleep interval to 10us, so that
xdp_hw_metadata is able to support periodic packets with shorter interval.
10us * 500 = 5ms should be enough for packet transmission and status
retrieval.
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20240303083225.1184165-2-yoong.siang.song@intel.com
|
|
Settle on three "flavors" of uprobe/uretprobe, installed on different
kinds of instruction: nop, push, and ret. All three are testing
different internal code paths emulating or single-stepping instructions,
so are interesting to compare and benchmark separately.
To ensure `push rbp` instruction we ensure that uprobe_target_push() is
not a leaf function by calling (global __weak) noop function and
returning something afterwards (if we don't do that, compiler will just
do a tail call optimization).
Also, we need to make sure that compiler isn't skipping frame pointer
generation, so let's add `-fno-omit-frame-pointers` to Makefile.
Just to give an idea of where we currently stand in terms of relative
performance of different uprobe/uretprobe cases vs a cheap syscall
(getpgid()) baseline, here are results from my local machine:
$ benchs/run_bench_uprobes.sh
base : 1.561 ± 0.020M/s
uprobe-nop : 0.947 ± 0.007M/s
uprobe-push : 0.951 ± 0.004M/s
uprobe-ret : 0.443 ± 0.007M/s
uretprobe-nop : 0.471 ± 0.013M/s
uretprobe-push : 0.483 ± 0.004M/s
uretprobe-ret : 0.306 ± 0.007M/s
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240301214551.1686095-1-andrii@kernel.org
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-02-29
We've added 119 non-merge commits during the last 32 day(s) which contain
a total of 150 files changed, 3589 insertions(+), 995 deletions(-).
The main changes are:
1) Extend the BPF verifier to enable static subprog calls in spin lock
critical sections, from Kumar Kartikeya Dwivedi.
2) Fix confusing and incorrect inference of PTR_TO_CTX argument type
in BPF global subprogs, from Andrii Nakryiko.
3) Larger batch of riscv BPF JIT improvements and enabling inlining
of the bpf_kptr_xchg() for RV64, from Pu Lehui.
4) Allow skeleton users to change the values of the fields in struct_ops
maps at runtime, from Kui-Feng Lee.
5) Extend the verifier's capabilities of tracking scalars when they
are spilled to stack, especially when the spill or fill is narrowing,
from Maxim Mikityanskiy & Eduard Zingerman.
6) Various BPF selftest improvements to fix errors under gcc BPF backend,
from Jose E. Marchesi.
7) Avoid module loading failure when the module trying to register
a struct_ops has its BTF section stripped, from Geliang Tang.
8) Annotate all kfuncs in .BTF_ids section which eventually allows
for automatic kfunc prototype generation from bpftool, from Daniel Xu.
9) Several updates to the instruction-set.rst IETF standardization
document, from Dave Thaler.
10) Shrink the size of struct bpf_map resp. bpf_array,
from Alexei Starovoitov.
11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer,
from Benjamin Tissoires.
12) Fix bpftool to be more portable to musl libc by using POSIX's
basename(), from Arnaldo Carvalho de Melo.
13) Add libbpf support to gcc in CORE macro definitions,
from Cupertino Miranda.
14) Remove a duplicate type check in perf_event_bpf_event,
from Florian Lehner.
15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them
with notrace correctly, from Yonghong Song.
16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible
array to fix build warnings, from Kees Cook.
17) Fix resolve_btfids cross-compilation to non host-native endianness,
from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
selftests/bpf: Test if shadow types work correctly.
bpftool: Add an example for struct_ops map and shadow type.
bpftool: Generated shadow variables for struct_ops maps.
libbpf: Convert st_ops->data to shadow type.
libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
bpf, arm64: use bpf_prog_pack for memory management
arm64: patching: implement text_poke API
bpf, arm64: support exceptions
arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT
bpf: add is_async_callback_calling_insn() helper
bpf: introduce in_sleepable() helper
bpf: allow more maps in sleepable bpf programs
selftests/bpf: Test case for lacking CFI stub functions.
bpf: Check cfi_stubs before registering a struct_ops type.
bpf: Clarify batch lookup/lookup_and_delete semantics
bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
bpf, docs: Fix typos in instruction-set.rst
selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
bpf: Shrink size of struct bpf_map/bpf_array.
...
====================
Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|