Age | Commit message (Collapse) | Author |
|
Replace deprecated 0-length array in struct bpf_lpm_trie_key with
flexible array. Found with GCC 13:
../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=]
207 | *(__be16 *)&key->data[i]);
| ^~~~~~~~~~~~~
../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16'
102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
| ^
../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu'
97 | #define be16_to_cpu __be16_to_cpu
| ^~~~~~~~~~~~~
../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu'
206 | u16 diff = be16_to_cpu(*(__be16 *)&node->data[i]
^
| ^~~~~~~~~~~
In file included from ../include/linux/bpf.h:7:
../include/uapi/linux/bpf.h:82:17: note: while referencing 'data'
82 | __u8 data[0]; /* Arbitrary size */
| ^~~~
And found at run-time under CONFIG_FORTIFY_SOURCE:
UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49
index 0 is out of range for type '__u8 [*]'
Changing struct bpf_lpm_trie_key is difficult since has been used by
userspace. For example, in Cilium:
struct egress_gw_policy_key {
struct bpf_lpm_trie_key lpm_key;
__u32 saddr;
__u32 daddr;
};
While direct references to the "data" member haven't been found, there
are static initializers what include the final member. For example,
the "{}" here:
struct egress_gw_policy_key in_key = {
.lpm_key = { 32 + 24, {} },
.saddr = CLIENT_IP,
.daddr = EXTERNAL_SVC_IP & 0Xffffff,
};
To avoid the build time and run time warnings seen with a 0-sized
trailing array for struct bpf_lpm_trie_key, introduce a new struct
that correctly uses a flexible array for the trailing bytes,
struct bpf_lpm_trie_key_u8. As part of this, include the "header"
portion (which is just the "prefixlen" member), so it can be used
by anything building a bpf_lpr_trie_key that has trailing members that
aren't a u8 flexible array (like the self-test[1]), which is named
struct bpf_lpm_trie_key_hdr.
Unfortunately, C++ refuses to parse the __struct_group() helper, so
it is not possible to define struct bpf_lpm_trie_key_hdr directly in
struct bpf_lpm_trie_key_u8, so we must open-code the union directly.
Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out,
and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment
to the UAPI header directing folks to the two new options.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Closes: https://paste.debian.net/hidden/ca500597/
Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1]
Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/ipv4/udp.c
f796feabb9f5 ("udp: add local "peek offset enabled" flag")
56667da7399e ("net: implement lockless setsockopt(SO_PEEK_OFF)")
Adjacent changes:
net/unix/garbage.c
aa82ac51d633 ("af_unix: Drop oob_skb ref before purging queue in GC.")
11498715f266 ("af_unix: Remove io_uring code for GC.")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This commit updates tcp_custom_syncookie.c:tcp_parse_option() to use
explicit packet offset (ctx->off) for packet access instead of ever
moving pointer (ctx->ptr), this reduces verification complexity:
- the tcp_parse_option() is passed as a callback to bpf_loop();
- suppose a checkpoint is created each time at function entry;
- the ctx->ptr is tracked by verifier as PTR_TO_PACKET;
- the ctx->ptr is incremented in tcp_parse_option(),
thus umax_value field tracked for it is incremented as well;
- on each next iteration of tcp_parse_option()
checkpoint from a previous iteration can't be reused
for state pruning, because PTR_TO_PACKET registers are
considered equivalent only if old->umax_value >= cur->umax_value;
- on the other hand, the ctx->off is a SCALAR,
subject to widen_imprecise_scalars();
- it's exact bounds are eventually forgotten and it is tracked as
unknown scalar at entry to tcp_parse_option();
- hence checkpoints created at the start of the function eventually
converge.
The change is similar to one applied in [0] to xdp_synproxy_kern.c.
Comparing before and after with veristat yields following results:
File Insns (A) Insns (B) Insns (DIFF)
------------------------------- --------- --------- -----------------
test_tcp_custom_syncookie.bpf.o 466657 12423 -454234 (-97.34%)
[0] commit 977bc146d4eb ("selftests/bpf: track tcp payload offset as scalar in xdp_synproxy")
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240222150300.14909-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Incorporate a test case to assess the handling of invalid flags or
task__nullable parameters passed to bpf_iter_task_new(). Prior to the
preceding commit, this scenario could potentially trigger a kernel panic.
However, with the previous commit, this test case is expected to function
correctly.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240217114152.1623-3-laoar.shao@gmail.com
|
|
bpf_timer_cancel
This selftest is based on a Alexei's test adopted from an internal
user to troubleshoot another bug. During this exercise, a separate
racing bug was discovered between bpf_timer_cancel_and_free
and bpf_timer_cancel. The details can be found in the previous
patch.
This patch is to add a selftest that can trigger the bug.
I can trigger the UAF everytime in my qemu setup with KASAN. The idea
is to have multiple user space threads running in a tight loop to exercise
both bpf_map_update_elem (which calls into bpf_timer_cancel_and_free)
and bpf_timer_cancel.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20240215211218.990808-2-martin.lau@linux.dev
|
|
Under x86-64, when using bpf_probe_read_kernel{_str}() or
bpf_probe_read{_str}() to read vsyscall page, the read may trigger oops,
so add one test case to ensure that the problem is fixed. Beside those
four bpf helpers mentioned above, testing the read of vsyscall page by
using bpf_probe_read_user{_str} and bpf_copy_from_user{_task}() as well.
The test case passes the address of vsyscall page to these six helpers
and checks whether the returned values are expected:
1) For bpf_probe_read_kernel{_str}()/bpf_probe_read{_str}(), the
expected return value is -ERANGE as shown below:
bpf_probe_read_kernel_common
copy_from_kernel_nofault
// false, return -ERANGE
copy_from_kernel_nofault_allowed
2) For bpf_probe_read_user{_str}(), the expected return value is -EFAULT
as show below:
bpf_probe_read_user_common
copy_from_user_nofault
// false, return -EFAULT
__access_ok
3) For bpf_copy_from_user(), the expected return value is -EFAULT:
// return -EFAULT
bpf_copy_from_user
copy_from_user
_copy_from_user
// return false
access_ok
4) For bpf_copy_from_user_task(), the expected return value is -EFAULT:
// return -EFAULT
bpf_copy_from_user_task
access_process_vm
// return 0
vma_lookup()
// return 0
expand_stack()
The occurrence of oops depends on the availability of CPU SMAP [1]
feature and there are three possible configurations of vsyscall page in
the boot cmd-line: vsyscall={xonly|none|emulate}, so there are a total
of six possible combinations. Under all these combinations, the test
case runs successfully.
[1]: https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20240202103935.3154011-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
With latest llvm19, I hit the following selftest failures with
$ ./test_progs -j
libbpf: prog 'on_event': BPF program load failed: Permission denied
libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG --
combined stack size of 4 calls is 544. Too large
verification time 1344153 usec
stack depth 24+440+0+32
processed 51008 insns (limit 1000000) max_states_per_insn 19 total_states 1467 peak_states 303 mark_read 146
-- END PROG LOAD LOG --
libbpf: prog 'on_event': failed to load: -13
libbpf: failed to load object 'strobemeta_subprogs.bpf.o'
scale_test:FAIL:expect_success unexpected error: -13 (errno 13)
#498 verif_scale_strobemeta_subprogs:FAIL
The verifier complains too big of the combined stack size (544 bytes) which
exceeds the maximum stack limit 512. This is a regression from llvm19 ([1]).
In the above error log, the original stack depth is 24+440+0+32.
To satisfy interpreter's need, in verifier the stack depth is adjusted to
32+448+32+32=544 which exceeds 512, hence the error. The same adjusted
stack size is also used for jit case.
But the jitted codes could use smaller stack size.
$ egrep -r stack_depth | grep round_up
arm64/net/bpf_jit_comp.c: ctx->stack_size = round_up(prog->aux->stack_depth, 16);
loongarch/net/bpf_jit.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
powerpc/net/bpf_jit_comp.c: cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
riscv/net/bpf_jit_comp32.c: round_up(ctx->prog->aux->stack_depth, STACK_ALIGN);
riscv/net/bpf_jit_comp64.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
s390/net/bpf_jit_comp.c: u32 stack_depth = round_up(fp->aux->stack_depth, 8);
sparc/net/bpf_jit_comp_64.c: stack_needed += round_up(stack_depth, 16);
x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8));
x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8);
x86/net/bpf_jit_comp.c: round_up(stack_depth, 8));
x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8);
x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
In the above, STACK_ALIGN in riscv/net/bpf_jit_comp32.c is defined as 16.
So stack is aligned in either 8 or 16, x86/s390 having 8-byte stack alignment and
the rest having 16-byte alignment.
This patch calculates total stack depth based on 16-byte alignment if jit is requested.
For the above failing case, the new stack size will be 32+448+0+32=512 and no verification
failure. llvm19 regression will be discussed separately in llvm upstream.
The verifier change caused three test failures as these tests compared messages
with stack size. More specifically,
- test_global_funcs/global_func1: fail with interpreter mode and success with jit mode.
Adjusted stack sizes so both jit and interpreter modes will fail.
- async_stack_depth/{pseudo_call_check, async_call_root_check}: since jit and interpreter
will calculate different stack sizes, the failure msg is adjusted to omit those
specific stack size numbers.
[1] https://lore.kernel.org/bpf/32bde0f0-1881-46c9-931a-673be566c61d@linux.dev/
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240214232951.4113094-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add tests validating that kernel handles pointer to anonymous struct
argument as PTR_TO_MEM case, not as PTR_TO_CTX case.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240212233221.2575350-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
For program types that don't have named context type name (e.g., BPF
iterator programs or tracepoint programs), ctx_tname will be a non-NULL
empty string. For such programs it shouldn't be possible to have
PTR_TO_CTX argument for global subprogs based on type name alone.
arg:ctx tag is the only way to have PTR_TO_CTX passed into global
subprog for such program types.
Fix this loophole, which currently would assume PTR_TO_CTX whenever
user uses a pointer to anonymous struct as an argument to their global
subprogs. This happens in practice with the following (quite common, in
practice) approach:
typedef struct { /* anonymous */
int x;
} my_type_t;
int my_subprog(my_type_t *arg) { ... }
User's intent is to have PTR_TO_MEM argument for `arg`, but verifier
will complain about expecting PTR_TO_CTX.
This fix also closes unintended s390x-specific KPROBE handling of
PTR_TO_CTX case. Selftest change is necessary to accommodate this.
Fixes: 91cc1a99740e ("bpf: Annotate context types")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240212233221.2575350-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Test if the verifier verifies nullable pointer arguments correctly for BPF
struct_ops programs.
"test_maybe_null" in struct bpf_testmod_ops is the operator defined for the
test cases here.
A BPF program should check a pointer for NULL beforehand to access the
value pointed by the nullable pointer arguments, or the verifier should
reject the programs. The test here includes two parts; the programs
checking pointers properly and the programs not checking pointers
beforehand. The test checks if the verifier accepts the programs checking
properly and rejects the programs not checking at all.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240209023750.1153905-5-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
[Changes from V1:
- Avoid conflict by rebasing with latest master.]
Some BPF tests use loop unrolling compiler pragmas that are clang
specific and not supported by GCC. These pragmas, along with their
GCC equivalences are:
#pragma clang loop unroll_count(N)
#pragma GCC unroll N
#pragma clang loop unroll(full)
#pragma GCC unroll 65534
#pragma clang loop unroll(disable)
#pragma GCC unroll 1
#pragma unroll [aka #pragma clang loop unroll(enable)]
There is no GCC equivalence to this pragma. It enables unrolling on
loops that the compiler would not ordinarily unroll even with
-O2|-funroll-loops, but it is not equivalent to full unrolling
either.
This patch adds a new header progs/bpf_compiler.h that defines the
following macros, which correspond to each pair of compiler-specific
pragmas above:
__pragma_loop_unroll_count(N)
__pragma_loop_unroll_full
__pragma_loop_no_unroll
__pragma_loop_unroll
The selftests using loop unrolling pragmas are then changed to include
the header and use these macros in place of the explicit pragmas.
Tested in bpf-next master.
No regressions.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240208203612.29611-1-jose.marchesi@oracle.com
|
|
Add two tests to ensure fentry programs cannot attach to
bpf_spin_{lock,unlock}() helpers. The tracing_failure.c files
can be used in the future for other tracing failure cases.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240207070107.335341-1-yonghong.song@linux.dev
|
|
In various performance profiles of kernels with BPF programs attached,
bpf_local_storage_lookup() appears as a significant portion of CPU
cycles spent. To enable the compiler generate more optimal code, turn
bpf_local_storage_lookup() into a static inline function, where only the
cache insertion code path is outlined
Notably, outlining cache insertion helps avoid bloating callers by
duplicating setting up calls to raw_spin_{lock,unlock}_irqsave() (on
architectures which do not inline spin_lock/unlock, such as x86), which
would cause the compiler produce worse code by deciding to outline
otherwise inlinable functions. The call overhead is neutral, because we
make 2 calls either way: either calling raw_spin_lock_irqsave() and
raw_spin_unlock_irqsave(); or call __bpf_local_storage_insert_cache(),
which calls raw_spin_lock_irqsave(), followed by a tail-call to
raw_spin_unlock_irqsave() where the compiler can perform TCO and (in
optimized uninstrumented builds) turns it into a plain jump. The call to
__bpf_local_storage_insert_cache() can be elided entirely if
cacheit_lockit is a false constant expression.
Based on results from './benchs/run_bench_local_storage.sh' (21 trials,
reboot between each trial; x86 defconfig + BPF, clang 16) this produces
improvements in throughput and latency in the majority of cases, with an
average (geomean) improvement of 8%:
+---- Hashmap Control --------------------
|
| + num keys: 10
| : <before> | <after>
| +-+ hashmap (control) sequential get +----------------------+----------------------
| +- hits throughput | 14.789 M ops/s | 14.745 M ops/s ( ~ )
| +- hits latency | 67.679 ns/op | 67.879 ns/op ( ~ )
| +- important_hits throughput | 14.789 M ops/s | 14.745 M ops/s ( ~ )
|
| + num keys: 1000
| : <before> | <after>
| +-+ hashmap (control) sequential get +----------------------+----------------------
| +- hits throughput | 12.233 M ops/s | 12.170 M ops/s ( ~ )
| +- hits latency | 81.754 ns/op | 82.185 ns/op ( ~ )
| +- important_hits throughput | 12.233 M ops/s | 12.170 M ops/s ( ~ )
|
| + num keys: 10000
| : <before> | <after>
| +-+ hashmap (control) sequential get +----------------------+----------------------
| +- hits throughput | 7.220 M ops/s | 7.204 M ops/s ( ~ )
| +- hits latency | 138.522 ns/op | 138.842 ns/op ( ~ )
| +- important_hits throughput | 7.220 M ops/s | 7.204 M ops/s ( ~ )
|
| + num keys: 100000
| : <before> | <after>
| +-+ hashmap (control) sequential get +----------------------+----------------------
| +- hits throughput | 5.061 M ops/s | 5.165 M ops/s (+2.1%)
| +- hits latency | 198.483 ns/op | 194.270 ns/op (-2.1%)
| +- important_hits throughput | 5.061 M ops/s | 5.165 M ops/s (+2.1%)
|
| + num keys: 4194304
| : <before> | <after>
| +-+ hashmap (control) sequential get +----------------------+----------------------
| +- hits throughput | 2.864 M ops/s | 2.882 M ops/s ( ~ )
| +- hits latency | 365.220 ns/op | 361.418 ns/op (-1.0%)
| +- important_hits throughput | 2.864 M ops/s | 2.882 M ops/s ( ~ )
|
+---- Local Storage ----------------------
|
| + num_maps: 1
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 33.005 M ops/s | 39.068 M ops/s (+18.4%)
| +- hits latency | 30.300 ns/op | 25.598 ns/op (-15.5%)
| +- important_hits throughput | 33.005 M ops/s | 39.068 M ops/s (+18.4%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 37.151 M ops/s | 44.926 M ops/s (+20.9%)
| +- hits latency | 26.919 ns/op | 22.259 ns/op (-17.3%)
| +- important_hits throughput | 37.151 M ops/s | 44.926 M ops/s (+20.9%)
|
| + num_maps: 10
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 32.288 M ops/s | 38.099 M ops/s (+18.0%)
| +- hits latency | 30.972 ns/op | 26.248 ns/op (-15.3%)
| +- important_hits throughput | 3.229 M ops/s | 3.810 M ops/s (+18.0%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 34.473 M ops/s | 41.145 M ops/s (+19.4%)
| +- hits latency | 29.010 ns/op | 24.307 ns/op (-16.2%)
| +- important_hits throughput | 12.312 M ops/s | 14.695 M ops/s (+19.4%)
|
| + num_maps: 16
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 32.524 M ops/s | 38.341 M ops/s (+17.9%)
| +- hits latency | 30.748 ns/op | 26.083 ns/op (-15.2%)
| +- important_hits throughput | 2.033 M ops/s | 2.396 M ops/s (+17.9%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 34.575 M ops/s | 41.338 M ops/s (+19.6%)
| +- hits latency | 28.925 ns/op | 24.193 ns/op (-16.4%)
| +- important_hits throughput | 11.001 M ops/s | 13.153 M ops/s (+19.6%)
|
| + num_maps: 17
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 28.861 M ops/s | 32.756 M ops/s (+13.5%)
| +- hits latency | 34.649 ns/op | 30.530 ns/op (-11.9%)
| +- important_hits throughput | 1.700 M ops/s | 1.929 M ops/s (+13.5%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 31.529 M ops/s | 36.110 M ops/s (+14.5%)
| +- hits latency | 31.719 ns/op | 27.697 ns/op (-12.7%)
| +- important_hits throughput | 9.598 M ops/s | 10.993 M ops/s (+14.5%)
|
| + num_maps: 24
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 18.602 M ops/s | 19.937 M ops/s (+7.2%)
| +- hits latency | 53.767 ns/op | 50.166 ns/op (-6.7%)
| +- important_hits throughput | 0.776 M ops/s | 0.831 M ops/s (+7.2%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 21.718 M ops/s | 23.332 M ops/s (+7.4%)
| +- hits latency | 46.047 ns/op | 42.865 ns/op (-6.9%)
| +- important_hits throughput | 6.110 M ops/s | 6.564 M ops/s (+7.4%)
|
| + num_maps: 32
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 14.118 M ops/s | 14.626 M ops/s (+3.6%)
| +- hits latency | 70.856 ns/op | 68.381 ns/op (-3.5%)
| +- important_hits throughput | 0.442 M ops/s | 0.458 M ops/s (+3.6%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 17.111 M ops/s | 17.906 M ops/s (+4.6%)
| +- hits latency | 58.451 ns/op | 55.865 ns/op (-4.4%)
| +- important_hits throughput | 4.776 M ops/s | 4.998 M ops/s (+4.6%)
|
| + num_maps: 100
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 5.281 M ops/s | 5.528 M ops/s (+4.7%)
| +- hits latency | 192.398 ns/op | 183.059 ns/op (-4.9%)
| +- important_hits throughput | 0.053 M ops/s | 0.055 M ops/s (+4.9%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 6.265 M ops/s | 6.498 M ops/s (+3.7%)
| +- hits latency | 161.436 ns/op | 152.877 ns/op (-5.3%)
| +- important_hits throughput | 1.636 M ops/s | 1.697 M ops/s (+3.7%)
|
| + num_maps: 1000
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 0.355 M ops/s | 0.354 M ops/s ( ~ )
| +- hits latency | 2826.538 ns/op | 2827.139 ns/op ( ~ )
| +- important_hits throughput | 0.000 M ops/s | 0.000 M ops/s ( ~ )
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 0.404 M ops/s | 0.403 M ops/s ( ~ )
| +- hits latency | 2481.190 ns/op | 2487.555 ns/op ( ~ )
| +- important_hits throughput | 0.102 M ops/s | 0.101 M ops/s ( ~ )
The on_lookup test in {cgrp,task}_ls_recursion.c is removed
because the bpf_local_storage_lookup is no longer traceable
and adding tracepoint will make the compiler generate worse
code: https://lore.kernel.org/bpf/ZcJmok64Xqv6l4ZS@elver.google.com/
Signed-off-by: Marco Elver <elver@google.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240207122626.3508658-1-elver@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
After the series "Annotate kfuncs in .BTF_ids section"[0], kfuncs can be
generated from bpftool. Let's mark the existing cpumask kfunc declarations
__weak so they don't conflict with definitions that will eventually come
from vmlinux.h.
[0]. https://lore.kernel.org/all/cover.1706491398.git.dxu@dxuuu.xyz
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/20240206081416.26242-5-laoar.shao@gmail.com
|
|
[Differences from V2:
- Remove conditionals in the source files pragmas, as the
pragma is supported by both GCC and clang.]
Both GCC and clang implement the -Wno-address-of-packed-member
warning, which is enabled by -Wall, that warns about taking the
address of a packed struct field when it can lead to an "unaligned"
address.
This triggers the following errors (-Werror) when building three
particular BPF selftests with GCC:
progs/test_cls_redirect.c
986 | if (ipv4_is_fragment((void *)&encap->ip)) {
progs/test_cls_redirect_dynptr.c
410 | pkt_ipv4_checksum((void *)&encap_gre->ip);
progs/test_cls_redirect.c
521 | pkt_ipv4_checksum((void *)&encap_gre->ip);
progs/test_tc_tunnel.c
232 | set_ipv4_csum((void *)&h_outer.ip);
These warnings do not signal any real problem in the tests as far as I
can see.
This patch adds pragmas to these test files that inhibit the
-Waddress-of-packed-member warning.
Tested in bpf-next master.
No regressions.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240206102330.7113-1-jose.marchesi@oracle.com
|
|
Add selftests covering the following cases:
- A static or global subprog called from within a RCU read section works
- A static subprog taking an RCU read lock which is released in caller works
- A static subprog releasing the caller's RCU read lock works
Global subprogs that leave the lock in an imbalanced state will not
work, as they are verified separately, so ensure those cases fail as
well.
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240205055646.1112186-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add selftests for static subprog calls within bpf_spin_lock critical
section, and ensure we still reject global subprog calls. Also test the
case where a subprog call will unlock the caller's held lock, or the
caller will unlock a lock taken by a subprog call, ensuring correct
transfer of lock state across frames on exit.
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240204222349.938118-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, calling any helpers, kfuncs, or subprogs except the graph
data structure (lists, rbtrees) API kfuncs while holding a bpf_spin_lock
is not allowed. One of the original motivations of this decision was to
force the BPF programmer's hand into keeping the bpf_spin_lock critical
section small, and to ensure the execution time of the program does not
increase due to lock waiting times. In addition to this, some of the
helpers and kfuncs may be unsafe to call while holding a bpf_spin_lock.
However, when it comes to subprog calls, atleast for static subprogs,
the verifier is able to explore their instructions during verification.
Therefore, it is similar in effect to having the same code inlined into
the critical section. Hence, not allowing static subprog calls in the
bpf_spin_lock critical section is mostly an annoyance that needs to be
worked around, without providing any tangible benefit.
Unlike static subprog calls, global subprog calls are not safe to permit
within the critical section, as the verifier does not explore them
during verification, therefore whether the same lock will be taken
again, or unlocked, cannot be ascertained.
Therefore, allow calling static subprogs within a bpf_spin_lock critical
section, and only reject it in case the subprog linkage is global.
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240204222349.938118-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
"r" is used to receive the return value of test_2 in bpf_testmod.c, but it
is not actually used. So, we remove "r" and change the return type to
"void".
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401300557.z5vzn8FM-lkp@intel.com/
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240204061204.1864529-1-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Somehow recently I frequently hit the following test failure
with either ./test_progs or ./test_progs-cpuv4:
serial_test_ptr_untrusted:PASS:skel_open 0 nsec
serial_test_ptr_untrusted:PASS:lsm_attach 0 nsec
serial_test_ptr_untrusted:PASS:raw_tp_attach 0 nsec
serial_test_ptr_untrusted:FAIL:cmp_tp_name unexpected cmp_tp_name: actual -115 != expected 0
#182 ptr_untrusted:FAIL
Further investigation found the failure is due to
bpf_probe_read_user_str()
where reading user-level string attr->raw_tracepoint.name
is not successfully, most likely due to the
string itself still in disk and not populated into memory yet.
One solution is do a printf() call of the string before doing bpf
syscall which will force the raw_tracepoint.name into memory.
But I think a more robust solution is to use bpf_copy_from_user()
which is used in sleepable program and can tolerate page fault,
and the fix here used the latter approach.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240204194452.2785936-1-yonghong.song@linux.dev
|
|
Add extra layer of global functions to ensure that passing around
(trusted) PTR_TO_BTF_ID_OR_NULL registers works as expected. We also
extend trusted_task_arg_nullable subtest to check three possible valid
argumements: known NULL, known non-NULL, and maybe NULL cases.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240202190529.2374377-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Check that stacksafe() compares spilled scalars with STACK_MISC.
The following combinations are explored:
- old spill of imprecise scalar is equivalent to cur STACK_{MISC,INVALID}
(plus error in unpriv mode);
- old spill of precise scalar is not equivalent to cur STACK_MISC;
- old STACK_MISC is equivalent to cur scalar;
- old STACK_MISC is not equivalent to cur non-scalar.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127175237.526726-7-maxtram95@gmail.com
|
|
The previous commit allowed to preserve boundaries and track IDs of
scalars on narrowing fills. Add test cases for that pattern.
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240127175237.526726-5-maxtram95@gmail.com
|
|
When the width of a fill is smaller than the width of the preceding
spill, the information about scalar boundaries can still be preserved,
as long as it's coerced to the right width (done by coerce_reg_to_size).
Even further, if the actual value fits into the fill width, the ID can
be preserved as well for further tracking of equal scalars.
Implement the above improvements, which makes narrowing fills behave the
same as narrowing spills and MOVs between registers.
Two tests are adjusted to accommodate for endianness differences and to
take into account that it's now allowed to do a narrowing fill from the
least significant bits.
reg_bounds_sync is added to coerce_reg_to_size to correctly adjust
umin/umax boundaries after the var_off truncation, for example, a 64-bit
value 0xXXXXXXXX00000000, when read as a 32-bit, gets umin = 0, umax =
0xFFFFFFFF, var_off = (0x0; 0xffffffff00000000), which needs to be
synced down to umax = 0, otherwise reg_bounds_sanity_check doesn't pass.
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127175237.526726-4-maxtram95@gmail.com
|
|
The previous commit added tracking for unbounded scalars on spill. Add
the test case to check the new functionality.
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240127175237.526726-3-maxtram95@gmail.com
|
|
Support the pattern where an unbounded scalar is spilled to the stack,
then boundary checks are performed on the src register, after which the
stack frame slot is refilled into a register.
Before this commit, the verifier didn't treat the src register and the
stack slot as related if the src register was an unbounded scalar. The
register state wasn't copied, the id wasn't preserved, and the stack
slot was marked as STACK_MISC. Subsequent boundary checks on the src
register wouldn't result in updating the boundaries of the spilled
variable on the stack.
After this commit, the verifier will preserve the bond between src and
dst even if src is unbounded, which permits to do boundary checks on src
and refill dst later, still remembering its boundaries. Such a pattern
is sometimes generated by clang when compiling complex long functions.
One test is adjusted to reflect that now unbounded scalars are tracked.
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240127175237.526726-2-maxtram95@gmail.com
|
|
Use more ergonomic bpf_core_cast() macro instead of bpf_rdonly_cast() in
selftests code.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130212023.183765-3-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add bpf_core_cast() macro that wraps bpf_rdonly_cast() kfunc. It's more
ergonomic than kfunc, as it automatically extracts btf_id with
bpf_core_type_id_kernel(), and works with type names. It also casts result
to (T *) pointer. See the definition of the macro, it's self-explanatory.
libbpf declares bpf_rdonly_cast() extern as __weak __ksym and should be
safe to not conflict with other possible declarations in user code.
But we do have a conflict with current BPF selftests that declare their
externs with first argument as `void *obj`, while libbpf opts into more
permissive `const void *obj`. This causes conflict, so we fix up BPF
selftests uses in the same patch.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130212023.183765-2-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add a bunch of test cases validating behavior of __arg_trusted and its
combination with __arg_nullable tag. We also validate CO-RE flavor
support by kernel for __arg_trusted args.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add tests for LSM interactions (both bpf_token_capable and bpf_token_cmd
LSM hooks) with BPF token in bpf() subsystem. Now child process passes
back token FD for parent to be able to do tests with token originating
in "wrong" userns. But we also create token in initns and check that
token LSMs don't accidentally reject BPF operations when capable()
checks pass without BPF token.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-31-andrii@kernel.org
|
|
Add a few tests that attempt to load BPF object containing privileged
map, program, and the one requiring mandatory BTF uploading into the
kernel (to validate token FD propagation to BPF_BTF_LOAD command).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-27-andrii@kernel.org
|
|
Create a new struct_ops type called bpf_testmod_ops within the bpf_testmod
module. When a struct_ops object is registered, the bpf_testmod module will
invoke test_2 from the module.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240119225005.668602-15-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Adding fill_link_info test for perf event and testing we
get its values back through the bpf_link_info interface.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240119110505.400573-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Some of the BPF selftests use the "p" constraint in inline assembly
snippets, for input operands for MOV (rN = rM) instructions.
This is mainly done via the __imm_ptr macro defined in
tools/testing/selftests/bpf/progs/bpf_misc.h:
#define __imm_ptr(name) [name]"p"(&name)
Example:
int consume_first_item_only(void *ctx)
{
struct bpf_iter_num iter;
asm volatile (
/* create iterator */
"r1 = %[iter];"
[...]
:
: __imm_ptr(iter)
: CLOBBERS);
[...]
}
The "p" constraint is a tricky one. It is documented in the GCC manual
section "Simple Constraints":
An operand that is a valid memory address is allowed. This is for
``load address'' and ``push address'' instructions.
p in the constraint must be accompanied by address_operand as the
predicate in the match_operand. This predicate interprets the mode
specified in the match_operand as the mode of the memory reference for
which the address would be valid.
There are two problems:
1. It is questionable whether that constraint was ever intended to be
used in inline assembly templates, because its behavior really
depends on compiler internals. A "memory address" is not the same
than a "memory operand" or a "memory reference" (constraint "m"), and
in fact its usage in the template above results in an error in both
x86_64-linux-gnu and bpf-unkonwn-none:
foo.c: In function ‘bar’:
foo.c:6:3: error: invalid 'asm': invalid expression as operand
6 | asm volatile ("r1 = %[jorl]" : : [jorl]"p"(&jorl));
| ^~~
I would assume the same happens with aarch64, riscv, and most/all
other targets in GCC, that do not accept operands of the form A + B
that are not wrapped either in a const or in a memory reference.
To avoid that error, the usage of the "p" constraint in internal GCC
instruction templates is supposed to be complemented by the 'a'
modifier, like in:
asm volatile ("r1 = %a[jorl]" : : [jorl]"p"(&jorl));
Internally documented (in GCC's final.cc) as:
%aN means expect operand N to be a memory address
(not a memory reference!) and print a reference
to that address.
That works because when the modifier 'a' is found, GCC prints an
"operand address", which is not the same than an "operand".
But...
2. Even if we used the internal 'a' modifier (we shouldn't) the 'rN =
rM' instruction really requires a register argument. In cases
involving automatics, like in the examples above, we easily end with:
bar:
#APP
r1 = r10-4
#NO_APP
In other cases we could conceibly also end with a 64-bit label that
may overflow the 32-bit immediate operand of `rN = imm32'
instructions:
r1 = foo
All of which is clearly wrong.
clang happens to do "the right thing" in the current usage of __imm_ptr
in the BPF tests, because even with -O2 it seems to "reload" the
fp-relative address of the automatic to a register like in:
bar:
r1 = r10
r1 += -4
#APP
r1 = r1
#NO_APP
Which is what GCC would generate with -O0. Whether this is by chance
or by design, the compiler shouln't be expected to do that reload
driven by the "p" constraint.
This patch changes the usage of the "p" constraint in the BPF
selftests macros to use the "r" constraint instead. If a register is
what is required, we should let the compiler know.
Previous discussion in bpf@vger:
https://lore.kernel.org/bpf/87h6p5ebpb.fsf@oracle.com/T/#ef0df83d6975c34dff20bf0dd52e078f5b8ca2767
Tested in bpf-next master.
No regressions.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240123181309.19853-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
GCC emits a warning:
progs/test_tcpbpf_kern.c:60:9: error: ‘op’ is used uninitialized [-Werror=uninitialized]
when an uninialized op is used with a "+r" constraint. The + modifier
means a read-write operand, but that operand in the selftest is just
written to.
This patch changes the selftest to use a "=r" constraint. This
pacifies GCC.
Tested in bpf-next master.
No regressions.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: Yonghong Song <yhs@meta.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240123205624.14746-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
VLAs are not supported by either the BPF port of clang nor GCC. The
selftest test_xdp_dynptr.c contains the following code:
const size_t tcphdr_sz = sizeof(struct tcphdr);
const size_t udphdr_sz = sizeof(struct udphdr);
const size_t ethhdr_sz = sizeof(struct ethhdr);
const size_t iphdr_sz = sizeof(struct iphdr);
const size_t ipv6hdr_sz = sizeof(struct ipv6hdr);
[...]
static __always_inline int handle_ipv4(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr)
{
__u8 eth_buffer[ethhdr_sz + iphdr_sz + ethhdr_sz];
__u8 iph_buffer_tcp[iphdr_sz + tcphdr_sz];
__u8 iph_buffer_udp[iphdr_sz + udphdr_sz];
[...]
}
The eth_buffer, iph_buffer_tcp and other automatics are fixed size
only if the compiler optimizes away the constant global variables.
clang does this, but GCC does not, turning these automatics into
variable length arrays.
This patch removes the global variables and turns these values into
preprocessor constants. This makes the selftest to build properly
with GCC.
Tested in bpf-next master.
No regressions.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: Yonghong Song <yhs@meta.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240123201729.16173-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Check that bpf_object__load() successfully creates map_in_maps
with BPF_MAP_TYPE_PERF_EVENT_ARRAY values.
These changes cover fix in the previous patch
"libbpf: Apply map_set_def_max_entries() for inner_maps on creation".
A command line output is:
- w/o fix
$ sudo ./test_maps
libbpf: map 'mim_array_pe': failed to create inner map: -22
libbpf: map 'mim_array_pe': failed to create: Invalid argument(-22)
libbpf: failed to load object './test_map_in_map.bpf.o'
Failed to load test prog
- with fix
$ sudo ./test_maps
...
test_maps: OK, 0 SKIPPED
Fixes: 646f02ffdd49 ("libbpf: Add BTF-defined map-in-map support")
Signed-off-by: Andrey Grafin <conquistador@yandex-team.ru>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20240117130619.9403-2-conquistador@yandex-team.ru
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This commit adds a sample selftest to demonstrate how we can use
bpf_sk_assign_tcp_reqsk() as the backend of SYN Proxy.
The test creates IPv4/IPv6 x TCP connections and transfer messages
over them on lo with BPF tc prog attached.
The tc prog will process SYN and returns SYN+ACK with the following
ISN and TS. In a real use case, this part will be done by other
hosts.
MSB LSB
ISN: | 31 ... 8 | 7 6 | 5 | 4 | 3 2 1 0 |
| Hash_1 | MSS | ECN | SACK | WScale |
TS: | 31 ... 8 | 7 ... 0 |
| Random | Hash_2 |
WScale in SYN is reused in SYN+ACK.
The client returns ACK, and tc prog will recalculate ISN and TS
from ACK and validate SYN Cookie.
If it's valid, the prog calls kfunc to allocate a reqsk for skb and
configure the reqsk based on the argument created from SYN Cookie.
Later, the reqsk will be processed in cookie_v[46]_check() to create
a connection.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240115205514.68364-7-kuniyu@amazon.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a selftest with a 4 bytes BPF_ST of 0 where the store is not
8-byte aligned. The goal is to ensure that STACK_ZERO is properly
marked in stack slots and the STACK_ZERO value can propagate
properly during the load.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240110051355.2737232-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
With patch set [1], precision backtracing supports register spill/fill
to/from the stack. The patch [2] allows initial imprecise register spill
with content 0. This is a common case for cpuv3 and lower for
initializing the stack variables with pattern
r1 = 0
*(u64 *)(r10 - 8) = r1
and the [2] has demonstrated good verification improvement.
For cpuv4, the initialization could be
*(u64 *)(r10 - 8) = 0
The current verifier marks the r10-8 contents with STACK_ZERO.
Similar to [2], let us permit the above insn to behave like
imprecise register spill which can reduce number of verified states.
The change is in function check_stack_write_fixed_off().
Before this patch, spilled zero will be marked as STACK_ZERO
which can provide precise values. In check_stack_write_var_off(),
STACK_ZERO will be maintained if writing a const zero
so later it can provide precise values if needed.
The above handling of '*(u64 *)(r10 - 8) = 0' as a spill
will have issues in check_stack_write_var_off() as the spill
will be converted to STACK_MISC and the precise value 0
is lost. To fix this issue, if the spill slots with const
zero and the BPF_ST write also with const zero, the spill slots
are preserved, which can later provide precise values
if needed. Without the change in check_stack_write_var_off(),
the test_verifier subtest 'BPF_ST_MEM stack imm zero, variable offset'
will fail.
I checked cpuv3 and cpuv4 with and without this patch with veristat.
There is no state change for cpuv3 since '*(u64 *)(r10 - 8) = 0'
is only generated with cpuv4.
For cpuv4:
$ ../veristat -C old.cpuv4.csv new.cpuv4.csv -e file,prog,insns,states -f 'insns_diff!=0'
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
------------------------------------------ ------------------- --------- --------- --------------- ---------- ---------- -------------
local_storage_bench.bpf.linked3.o get_local 228 168 -60 (-26.32%) 17 14 -3 (-17.65%)
pyperf600_bpf_loop.bpf.linked3.o on_event 6066 4889 -1177 (-19.40%) 403 321 -82 (-20.35%)
test_cls_redirect.bpf.linked3.o cls_redirect 35483 35387 -96 (-0.27%) 2179 2177 -2 (-0.09%)
test_l4lb_noinline.bpf.linked3.o balancer_ingress 4494 4522 +28 (+0.62%) 217 219 +2 (+0.92%)
test_l4lb_noinline_dynptr.bpf.linked3.o balancer_ingress 1432 1455 +23 (+1.61%) 92 94 +2 (+2.17%)
test_xdp_noinline.bpf.linked3.o balancer_ingress_v6 3462 3458 -4 (-0.12%) 216 216 +0 (+0.00%)
verifier_iterating_callbacks.bpf.linked3.o widening 52 41 -11 (-21.15%) 4 3 -1 (-25.00%)
xdp_synproxy_kern.bpf.linked3.o syncookie_tc 12412 11719 -693 (-5.58%) 345 330 -15 (-4.35%)
xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 12478 11794 -684 (-5.48%) 346 331 -15 (-4.34%)
test_l4lb_noinline and test_l4lb_noinline_dynptr has minor regression, but
pyperf600_bpf_loop and local_storage_bench gets pretty good improvement.
[1] https://lore.kernel.org/all/20231205184248.1502704-1-andrii@kernel.org/
[2] https://lore.kernel.org/all/20231205184248.1502704-9-andrii@kernel.org/
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Tested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240110051348.2737007-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The previous commit implemented assigning IDs to registers holding
scalars before spill. Add the test cases to check the new functionality.
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240108205209.838365-10-maxtram95@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, when a scalar bounded register is spilled to the stack, its
ID is preserved, but only if was already assigned, i.e. if this register
was MOVed before.
Assign an ID on spill if none is set, so that equal scalars could be
tracked if a register is spilled to the stack and filled into another
register.
One test is adjusted to reflect the change in register IDs.
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240108205209.838365-9-maxtram95@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When a range check is performed on a register that was 32-bit spilled to
the stack, the IDs of the two instances of the register are the same, so
the range should also be the same.
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240108205209.838365-6-maxtram95@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Verify that infinite loop detection logic separates states with
identical register states but different imprecise scalars spilled to
stack.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240108205209.838365-4-maxtram95@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The u64_offset_to_skb_data test is supposed to make a 64-bit fill, but
instead makes a 16-bit one. Fix the test according to its intention and
update the comments accordingly (umax is no longer 0xffff). The 16-bit
fill is covered by u16_offset_to_skb_data.
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240108205209.838365-2-maxtram95@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
reviews.llvm.org was LLVM's Phabricator instances for code review. It
has been abandoned in favor of GitHub pull requests. While the majority
of links in the kernel sources still work because of the work Fangrui
has done turning the dynamic Phabricator instance into a static archive,
there are some issues with that work, so preemptively convert all the
links in the kernel sources to point to the commit on GitHub.
Most of the commits have the corresponding differential review link in
the commit message itself so there should not be any loss of fidelity in
the relevant information.
Additionally, fix a typo in the xdpwall.c print ("LLMV" -> "LLVM") while
in the area.
Link: https://discourse.llvm.org/t/update-on-github-pull-requests/71540/172
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240111-bpf-update-llvm-phabricator-links-v2-1-9a7ae976bd64@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The test uses bpf_prog_get_info_by_fd() to obtain the xlated
instructions of the program first. Since these instructions have
already been rewritten by the verifier, the tests then checks whether
the rewritten instructions are as expected. And to ensure LLVM generates
code exactly as expected, use inline assembly and a naked function.
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240105104819.3916743-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a bunch of global subprogs across variety of program types to
validate expected kernel type enforcement logic for __arg_ctx arguments.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240118033143.3384355-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a test case for PTR_TO_FLOW_KEYS alu. Testing if alu with variable
offset on flow_keys is rejected. For the fixed offset success case, we
already have C code coverage to verify (e.g. via bpf_flow.c).
Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240115082028.9992-2-sunhao.th@gmail.com
|
|
The patch adds a test to exercise the bpf_iter_udp batching
logic. It specifically tests the case that there are multiple
so_reuseport udp_sk in a bucket of the udp_table.
The test creates two sets of so_reuseport sockets and
each set on a different port. Meaning there will be
two buckets in the udp_table.
The test does the following:
1. read() 3 out of 4 sockets in the first bucket.
2. close() all sockets in the first bucket. This
will ensure the current bucket's offset in
the kernel does not affect the read() of the
following bucket.
3. read() all 4 sockets in the second bucket.
The test also reads one udp_sk at a time from
the bpf_iter_udp prog. The true case in
"do_test(..., bool onebyone)". This is the buggy case
that the previous patch fixed.
It also tests the "false" case in "do_test(..., bool onebyone)",
meaning the userspace reads the whole bucket. There is
no bug in this case but adding this test also while
at it.
Considering the way to have multiple tcp_sk in the same
bucket is similar (by using so_reuseport),
this patch also tests the bpf_iter_tcp even though the
bpf_iter_tcp batching logic works correctly.
Both IP v4 and v6 are exercising the same bpf_iter batching
code path, so only v6 is tested.
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240112190530.3751661-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|