Age | Commit message (Collapse) | Author |
|
Pu Lehui says:
====================
Fix accessing first syscall argument on RV64
On RV64, as Ilya mentioned before [0], the first syscall parameter should be
accessed through orig_a0 (see arch/riscv64/include/asm/syscall.h),
otherwise it will cause selftests like bpf_syscall_macro, vmlinux,
test_lsm, etc. to fail on RV64.
Link: https://lore.kernel.org/bpf/20220209021745.2215452-1-iii@linux.ibm.com [0]
v3:
- Fix test case error.
v2: https://lore.kernel.org/all/20240831023646.1558629-1-pulehui@huaweicloud.com/
- Access first syscall argument with CO-RE direct read. (Andrii)
v1: https://lore.kernel.org/all/20240829133453.882259-1-pulehui@huaweicloud.com/
====================
Link: https://lore.kernel.org/r/20240831041934.1629216-1-pulehui@huaweicloud.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
On RV64, as Ilya mentioned before [0], the first syscall parameter should be
accessed through orig_a0 (see arch/riscv64/include/asm/syscall.h),
otherwise it will cause selftests like bpf_syscall_macro, vmlinux,
test_lsm, etc. to fail on RV64. Let's fix it by using the struct pt_regs
style CO-RE direct access.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-1-iii@linux.ibm.com [0]
Link: https://lore.kernel.org/bpf/20240831041934.1629216-5-pulehui@huaweicloud.com
|
|
Considering that CO-RE direct read access to the first system call
argument is already available on s390 and arm64, let's enable
test_bpf_syscall_macro:syscall_arg1 on these architectures.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240831041934.1629216-4-pulehui@huaweicloud.com
|
|
Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
into account the read pt_regs comes directly from the context, let's use
CO-RE direct read to access the first system call argument.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/bpf/20240831041934.1629216-3-pulehui@huaweicloud.com
|
|
Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
into account the read pt_regs comes directly from the context, let's use
CO-RE direct read to access the first system call argument.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240831041934.1629216-2-pulehui@huaweicloud.com
|
|
The core part of the selftest, i.e., the je <-> jmp cycle, mimics the
original sched-ext bpf program. The test will fail without the
previous patch.
I tried to create some cases for other potential cycles
(je <-> je, jmp <-> je and jmp <-> jmp) with similar pattern
to the test in this patch, but failed. So this patch
only contains one test for je <-> jmp cycle.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240904221256.37389-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Daniel Hodges reported a jit error when playing with a sched-ext program.
The error message is:
unexpected jmp_cond padding: -4 bytes
But further investigation shows the error is actual due to failed
convergence. The following are some analysis:
...
pass4, final_proglen=4391:
...
20e: 48 85 ff test rdi,rdi
211: 74 7d je 0x290
213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
289: 48 85 ff test rdi,rdi
28c: 74 17 je 0x2a5
28e: e9 7f ff ff ff jmp 0x212
293: bf 03 00 00 00 mov edi,0x3
Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125)
and insn at 0x28e is 5-byte jmp insn with offset -129.
pass5, final_proglen=4392:
...
20e: 48 85 ff test rdi,rdi
211: 0f 84 80 00 00 00 je 0x297
217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
28d: 48 85 ff test rdi,rdi
290: 74 1a je 0x2ac
292: eb 84 jmp 0x218
294: bf 03 00 00 00 mov edi,0x3
Note that insn at 0x211 is 6-byte cond jump insn now since its offset
becomes 0x80 based on previous round (0x293 - 0x213 = 0x80). At the same
time, insn at 0x292 is a 2-byte insn since its offset is -124.
pass6 will repeat the same code as in pass4. pass7 will repeat the same
code as in pass5, and so on. This will prevent eventual convergence.
Passes 1-14 are with padding = 0. At pass15, padding is 1 and related
insn looks like:
211: 0f 84 80 00 00 00 je 0x297
217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
24d: 48 85 d2 test rdx,rdx
The similar code in pass14:
211: 74 7d je 0x290
213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
249: 48 85 d2 test rdx,rdx
24c: 74 21 je 0x26f
24e: 48 01 f7 add rdi,rsi
...
Before generating the following insn,
250: 74 21 je 0x273
"padding = 1" enables some checking to ensure nops is either 0 or 4
where
#define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
nops = INSN_SZ_DIFF - 2
In this specific case,
addrs[i] = 0x24e // from pass14
addrs[i-1] = 0x24d // from pass15
prog - temp = 3 // from 'test rdx,rdx' in pass15
so
nops = -4
and this triggers the failure.
To fix the issue, we need to break cycles of je <-> jmp. For example,
in the above case, we have
211: 74 7d je 0x290
the offset is 0x7d. If 2-byte je insn is generated only if
the offset is less than 0x7d (<= 0x7c), the cycle can be
break and we can achieve the convergence.
I did some study on other cases like je <-> je, jmp <-> je and
jmp <-> jmp which may cause cycles. Those cases are not from actual
reproducible cases since it is pretty hard to construct a test case
for them. the results show that the offset <= 0x7b (0x7b = 123) should
be enough to cover all cases. This patch added a new helper to generate 8-bit
cond/uncond jmp insns only if the offset range is [-128, 123].
Reported-by: Daniel Hodges <hodgesd@meta.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240904221251.37109-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The ARRAY_SIZE macro is more compact and more formal in linux source.
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240903072559.292607-1-yangfeng59949@163.com
|
|
Martin KaFai Lau says:
====================
bpf: Follow up on gen_epilogue
From: Martin KaFai Lau <martin.lau@kernel.org>
The set addresses some follow ups on the earlier gen_epilogue
patch set.
====================
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240904180847.56947-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
There is a report on new indentation issue in epilogue_idx.
This patch fixed it.
Fixes: 169c31761c8d ("bpf: Add gen_epilogue to bpf_verifier_ops")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408311622.4GzlzN33-lkp@intel.com/
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240904180847.56947-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch removes the insn_buf array stack usage from the
inline_bpf_loop(). Instead, the env->insn_buf is used. The
usage in inline_bpf_loop() needs more than 16 insn, so the
INSN_BUF_SIZE needs to be increased from 16 to 32.
The compiler stack size warning on the verifier is gone
after this change.
Cc: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240904180847.56947-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In commit ba8de796baf4 ("net: introduce sk_skb_reason_drop function")
kfree_skb_reason() becomes an inline function and cannot be traced.
samples/bpf is abandonware by now, and we should slowly but surely
convert whatever makes sense into BPF selftests under
tools/testing/selftests/bpf and just get rid of the rest.
Link: https://github.com/torvalds/linux/commit/ba8de796baf4bdc03530774fb284fe3c97875566
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Link: https://lore.kernel.org/r/tencent_30ADAC88CB2915CA57E9512D4460035BA107@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When the PROCMAP_QUERY is not defined, a compilation error occurs due to the
mismatch of the procmap_query()'s params, procmap_query() only be called in
the file where the function is defined, modify the params so they can match.
We get a warning when build samples/bpf:
trace_helpers.c:252:5: warning: no previous prototype for ‘procmap_query’ [-Wmissing-prototypes]
252 | int procmap_query(int fd, const void *addr, __u32 query_flags, size_t *start, size_t *offset, int *flags)
| ^~~~~~~~~~~~~
As this function is only used in the file, mark it as 'static'.
Fixes: 4e9e07603ecd ("selftests/bpf: make use of PROCMAP_QUERY ioctl if available")
Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
Link: https://lore.kernel.org/r/20240903012839.3178-1-chenyuan_fl@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, BPF_CALL is always jited to indirect call. When target is
within the range of direct call, BPF_CALL can be jited to direct call.
For example, the following BPF_CALL
call __htab_map_lookup_elem
is always jited to indirect call:
mov x10, #0xffffffffffff18f4
movk x10, #0x821, lsl #16
movk x10, #0x8000, lsl #32
blr x10
When the address of target __htab_map_lookup_elem is within the range of
direct call, the BPF_CALL can be jited to:
bl 0xfffffffffd33bc98
This patch does such jit optimization by emitting arm64 direct calls for
BPF_CALL when possible, indirect calls otherwise.
Without this patch, the jit works as follows.
1. First pass
A. Determine jited position and size for each bpf instruction.
B. Computed the jited image size.
2. Allocate jited image with size computed in step 1.
3. Second pass
A. Adjust jump offset for jump instructions
B. Write the final image.
This works because, for a given bpf prog, regardless of where the jited
image is allocated, the jited result for each instruction is fixed. The
second pass differs from the first only in adjusting the jump offsets,
like changing "jmp imm1" to "jmp imm2", while the position and size of
the "jmp" instruction remain unchanged.
Now considering whether to jit BPF_CALL to arm64 direct or indirect call
instruction. The choice depends solely on the jump offset: direct call
if the jump offset is within 128MB, indirect call otherwise.
For a given BPF_CALL, the target address is known, so the jump offset is
decided by the jited address of the BPF_CALL instruction. In other words,
for a given bpf prog, the jited result for each BPF_CALL is determined
by its jited address.
The jited address for a BPF_CALL is the jited image address plus the
total jited size of all preceding instructions. For a given bpf prog,
there are clearly no BPF_CALL instructions before the first BPF_CALL
instruction. Since the jited result for all other instructions other
than BPF_CALL are fixed, the total jited size preceding the first
BPF_CALL is also fixed. Therefore, once the jited image is allocated,
the jited address for the first BPF_CALL is fixed.
Now that the jited result for the first BPF_CALL is fixed, the jited
results for all instructions preceding the second BPF_CALL are fixed.
So the jited address and result for the second BPF_CALL are also fixed.
Similarly, we can conclude that the jited addresses and results for all
subsequent BPF_CALL instructions are fixed.
This means that, for a given bpf prog, once the jited image is allocated,
the jited address and result for all instructions, including all BPF_CALL
instructions, are fixed.
Based on the observation, with this patch, the jit works as follows.
1. First pass
Estimate the maximum jited image size. In this pass, all BPF_CALLs
are jited to arm64 indirect calls since the jump offsets are unknown
because the jited image is not allocated.
2. Allocate jited image with size estimated in step 1.
3. Second pass
A. Determine the jited result for each BPF_CALL.
B. Determine jited address and size for each bpf instruction.
4. Third pass
A. Adjust jump offset for jump instructions.
B. Write the final image.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240903094407.601107-1-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Wrong function is used to access the first enum64 element. Substituting btf_enum(t)
with btf_enum64(t) for BTF_KIND_ENUM64.
Fixes: 94133cf24bb3 ("bpftool: Introduce btf c dump sorting")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240902171721.105253-1-mykyta.yatsenko5@gmail.com
|
|
In bpftool-net documentation, two blank lines are missing in a
recently added example, causing docutils to complain:
$ cd tools/bpf/bpftool
$ make doc
DESCEND Documentation
GEN bpftool-btf.8
GEN bpftool-cgroup.8
GEN bpftool-feature.8
GEN bpftool-gen.8
GEN bpftool-iter.8
GEN bpftool-link.8
GEN bpftool-map.8
GEN bpftool-net.8
<stdin>:189: (INFO/1) Possible incomplete section title.
Treating the overline as ordinary text because it's so short.
<stdin>:192: (INFO/1) Blank line missing before literal block (after the "::")? Interpreted as a definition list item.
<stdin>:199: (INFO/1) Possible incomplete section title.
Treating the overline as ordinary text because it's so short.
<stdin>:201: (INFO/1) Blank line missing before literal block (after the "::")? Interpreted as a definition list item.
GEN bpftool-perf.8
GEN bpftool-prog.8
GEN bpftool.8
GEN bpftool-struct_ops.8
Add the missing blank lines.
Fixes: 0d7c06125cea ("bpftool: Add document for net attach/detach on tcx subcommand")
Signed-off-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240901210742.25758-1-qmo@kernel.org
|
|
%.bpf.o objects depend on vmlinux.h, which makes them transitively
dependent on unnecessary libbpf headers. However vmlinux.h doesn't
actually change as often.
When generating vmlinux.h, compare it to a previous version and update
it only if there are changes.
Example of build time improvement (after first clean build):
$ touch ../../../lib/bpf/bpf.h
$ time make -j8
Before: real 1m37.592s
After: real 0m27.310s
Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed.
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20240828174608.377204-2-ihor.solodrai@pm.me
|
|
Test %.bpf.o objects actually depend only on some libbpf headers.
Define a list of required headers and use it as TRUNNER_BPF_OBJS
dependency.
bpf_*.h list was determined by:
$ grep -rh 'include <bpf/bpf_' progs | sort -u
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link:
Link: https://lore.kernel.org/bpf/20240828174608.377204-1-ihor.solodrai@pm.me
https://lore.kernel.org/bpf/CAEf4BzYQ-j2i_xjs94Nn=8+FVfkWt51mLZyiYKiz9oA4Z=pCeA@mail.gmail.com/
|
|
Create a BTF with endianness different from host, make a distilled
base/split BTF pair from it, dump as raw bytes, import again and
verify that endianness is preserved.
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240830173406.1581007-1-eddyz87@gmail.com
|
|
New split BTF needs to preserve base's endianness. Similarly, when
creating a distilled BTF, we need to preserve original endianness.
Fix by updating libbpf's btf__distill_base() and btf_new_empty() to retain
the byte order of any source BTF objects when creating new ones.
Fixes: ba451366bf44 ("libbpf: Implement basic split BTF support")
Fixes: 58e185a0dc35 ("libbpf: Add btf__distill_base() creating split BTF with distilled base BTF")
Reported-by: Song Liu <song@kernel.org>
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/6358db36c5f68b07873a0a5be2d062b1af5ea5f8.camel@gmail.com/
Link: https://lore.kernel.org/bpf/20240830095150.278881-1-tony.ambardar@gmail.com
|
|
Replace fput() with sockfd_put() in bpf_fd_reuseport_array_update_elem().
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20240830020756.607877-1-ruanjinjie@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
According to the documentation, when building a kernel with the C=2
parameter, all source files should be checked. But this does not happen
for the kernel/bpf/ directory.
$ touch kernel/bpf/core.o
$ make C=2 CHECK=true kernel/bpf/core.o
Outputs:
CHECK scripts/mod/empty.c
CALL scripts/checksyscalls.sh
DESCEND objtool
INSTALL libsubcmd_headers
CC kernel/bpf/core.o
As can be seen the compilation is done, but CHECK is not executed. This
happens because kernel/bpf/Makefile has defined its own rule for
compilation and forgotten the macro that does the check.
There is no need to duplicate the build code, and this rule can be
removed to use generic rules.
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://lore.kernel.org/r/20240830074350.211308-1-legion@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds test cases for iter next method returning valid
pointer, which can also used as usage examples.
Currently iter next method should return valid pointer.
iter_next_trusted is the correct usage and test if iter next method
return valid pointer. bpf_iter_task_vma_next has KF_RET_NULL flag,
so the returned pointer may be NULL. We need to check if the pointer
is NULL before using it.
iter_next_trusted_or_null is the incorrect usage. There is no checking
before using the pointer, so it will be rejected by the verifier.
iter_next_rcu and iter_next_rcu_or_null are similar test cases for
KF_RCU_PROTECTED iterators.
iter_next_rcu_not_trusted is used to test that the pointer returned by
iter next method of KF_RCU_PROTECTED iterator cannot be passed in
KF_TRUSTED_ARGS kfuncs.
iter_next_ptr_mem_not_trusted is used to test that base type
PTR_TO_MEM should not be combined with type flag PTR_TRUSTED.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB5848709758F6922F02AF9F1F99962@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently we cannot pass the pointer returned by iter next method as
argument to KF_TRUSTED_ARGS or KF_RCU kfuncs, because the pointer
returned by iter next method is not "valid".
This patch sets the pointer returned by iter next method to be valid.
This is based on the fact that if the iterator is implemented correctly,
then the pointer returned from the iter next method should be valid.
This does not make NULL pointer valid. If the iter next method has
KF_RET_NULL flag, then the verifier will ask the ebpf program to
check NULL pointer.
KF_RCU_PROTECTED iterator is a special case, the pointer returned by
iter next method should only be valid within RCU critical section,
so it should be with MEM_RCU, not PTR_TRUSTED.
Another special case is bpf_iter_num_next, which returns a pointer with
base type PTR_TO_MEM. PTR_TO_MEM should not be combined with type flag
PTR_TRUSTED (PTR_TO_MEM already means the pointer is valid).
The pointer returned by iter next method of other types of iterators
is with PTR_TRUSTED.
In addition, this patch adds get_iter_from_state to help us get the
current iterator from the current state.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB584869F8B448EA1C87B7CDA399962@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Martin KaFai Lau says:
====================
bpf: Add gen_epilogue to bpf_verifier_ops
From: Martin KaFai Lau <martin.lau@kernel.org>
This set allows the subsystem to patch codes before BPF_EXIT.
The verifier ops, .gen_epilogue, is added for this purpose.
One of the use case will be in the bpf qdisc, the bpf qdisc
subsystem can ensure the skb->dev is in the correct value.
The bpf qdisc subsystem can either inline fixing it in the
epilogue or call another kernel function to handle it (e.g. drop)
in the epilogue. Another use case could be in bpf_tcp_ca.c to
enforce snd_cwnd has valid value (e.g. positive value).
v5:
* Removed the skip_cnt argument from adjust_jmp_off() in patch 2.
Instead, reuse the delta argument and skip
the [tgt_idx, tgt_idx + delta) instructions.
* Added a BPF_JMP32_A macro in patch 3.
* Removed pro_epilogue_subprog.c in patch 6.
The pro_epilogue_kfunc.c has covered the subprog case.
Renamed the file pro_epilogue_kfunc.c to pro_epilogue.c.
Some of the SEC names and function names are changed
accordingly (mainly shorten them by removing the _kfunc suffix).
* Added comments to explain the tail_call result in patch 7.
* Fixed the following bpf CI breakages. I ran it in CI
manually to confirm:
https://github.com/kernel-patches/bpf/actions/runs/10590714532
* s390 zext added "w3 = w3". Adjusted the test to
use all ALU64 and BPF_DW to avoid zext.
Also changed the "int a" in the "struct st_ops_args" to "u64 a".
* llvm17 does not take:
*(u64 *)(r1 +0) = 0;
so it is changed to:
r3 = 0;
*(u64 *)(r1 +0) = r3;
v4:
* Fixed a bug in the memcpy in patch 3
The size in the memcpy should be
epilogue_cnt * sizeof(*epilogue_buf)
v3:
* Moved epilogue_buf[16] to env.
Patch 1 is added to move the existing insn_buf[16] to env.
* Fixed a case that the bpf prog has a BPF_JMP that goes back
to the first instruction of the main prog.
The jump back to 1st insn case also applies to the prologue.
Patch 2 is added to handle it.
* If the bpf main prog has multiple BPF_EXIT, use a BPF_JA
to goto the earlier patched epilogue.
Note that there are (BPF_JMP32 | BPF_JA) vs (BPF_JMP | BPF_JA)
details in the patch 3 commit message.
* There are subtle changes in patch 3, so I reset the Reviewed-by.
* Added patch 8 and patch 9 to cover the changes in patch 2 and patch 3.
* Dropped the kfunc call from pro/epilogue and its selftests.
v2:
* Remove the RFC tag. Keep the ordering at where .gen_epilogue is
called in the verifier relative to the check_max_stack_depth().
This will be consistent with the other extra stack_depth
usage like optimize_bpf_loop().
* Use __xlated check provided by the test_loader to
check the patched instructions after gen_pro/epilogue (Eduard).
* Added Patch 3 by Eduard (Thanks!).
====================
Link: https://lore.kernel.org/r/20240829210833.388152-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch tests the epilogue patching when the main prog has
multiple BPF_EXIT. The verifier should have patched the 2nd (and
later) BPF_EXIT with a BPF_JA that goes back to the earlier
patched epilogue instructions.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-10-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds a pro/epilogue test when the main prog has a goto insn
that goes back to the very first instruction of the prog. It is
to test the correctness of the adjust_jmp_off(prog, 0, delta)
after the verifier has applied the prologue and/or epilogue patch.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-9-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds a gen_epilogue test to test a main prog
using a bpf_tail_call.
A non test_loader test is used. The tailcall target program,
"test_epilogue_subprog", needs to be used in a struct_ops map
before it can be loaded. Another struct_ops map is also needed
to host the actual "test_epilogue_tailcall" struct_ops program
that does the bpf_tail_call. The earlier test_loader patch
will attach all struct_ops maps but the bpf_testmod.c does
not support >1 attached struct_ops.
The earlier patch used the test_loader which has already covered
checking for the patched pro/epilogue instructions. This is done
by the __xlated tag.
This patch goes for the regular skel load and syscall test to do
the tailcall test that can also allow to directly pass the
the "struct st_ops_args *args" as ctx_in to the
SEC("syscall") program.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-8-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This test adds a new struct_ops "bpf_testmod_st_ops" in bpf_testmod.
The ops of the bpf_testmod_st_ops is triggered by new kfunc calls
"bpf_kfunc_st_ops_test_*logue". These new kfunc calls are
primarily used by the SEC("syscall") program. The test triggering
sequence is like:
SEC("syscall")
syscall_prologue(struct st_ops_args *args)
bpf_kfunc_st_op_test_prologue(args)
st_ops->test_prologue(args)
.gen_prologue adds 1000 to args->a
.gen_epilogue adds 10000 to args->a
.gen_epilogue will also set the r0 to 2 * args->a.
The .gen_prologue and .gen_epilogue of the bpf_testmod_st_ops
will test the prog->aux->attach_func_name to decide if
it needs to generate codes.
The main programs of the pro_epilogue.c will call a
new kfunc bpf_kfunc_st_ops_inc10 which does "args->a += 10".
It will also call a subprog() which does "args->a += 1".
This patch uses the test_loader infra to check the __xlated
instructions patched after gen_prologue and/or gen_epilogue.
The __xlated check is based on Eduard's example (Thanks!) in v1.
args->a is returned by the struct_ops prog (either the main prog
or the epilogue). Thus, the __retval of the SEC("syscall") prog
is checked. For example, when triggering the ops in the
'SEC("struct_ops/test_epilogue") int test_epilogue'
The expected args->a is +1 (subprog call) + 10 (kfunc call)
+ 10000 (.gen_epilogue) = 10011.
The expected return value is 2 * 10011 (.gen_epilogue).
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-7-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In test_loader based tests to bpf_map__attach_struct_ops()
before call to bpf_prog_test_run_opts() in order to trigger
bpf_struct_ops->reg() callbacks on kernel side.
This allows to use __retval macro for struct_ops tests.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The bpf_testmod needs to use the bpf_tail_call helper in
a later selftest patch. This patch is to EXPORT_GPL_SYMBOL
the bpf_base_func_proto.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-5-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds a .gen_epilogue to the bpf_verifier_ops. It is similar
to the existing .gen_prologue. Instead of allowing a subsystem
to run code at the beginning of a bpf prog, it allows the subsystem
to run code just before the bpf prog exit.
One of the use case is to allow the upcoming bpf qdisc to ensure that
the skb->dev is the same as the qdisc->dev_queue->dev. The bpf qdisc
struct_ops implementation could either fix it up or drop the skb.
Another use case could be in bpf_tcp_ca.c to enforce snd_cwnd
has sane value (e.g. non zero).
The epilogue can do the useful thing (like checking skb->dev) if it
can access the bpf prog's ctx. Unlike prologue, r1 may not hold the
ctx pointer. This patch saves the r1 in the stack if the .gen_epilogue
has returned some instructions in the "epilogue_buf".
The existing .gen_prologue is done in convert_ctx_accesses().
The new .gen_epilogue is done in the convert_ctx_accesses() also.
When it sees the (BPF_JMP | BPF_EXIT) instruction, it will be patched
with the earlier generated "epilogue_buf". The epilogue patching is
only done for the main prog.
Only one epilogue will be patched to the main program. When the
bpf prog has multiple BPF_EXIT instructions, a BPF_JA is used
to goto the earlier patched epilogue. Majority of the archs
support (BPF_JMP32 | BPF_JA): x86, arm, s390, risv64, loongarch,
powerpc and arc. This patch keeps it simple and always
use (BPF_JMP32 | BPF_JA). A new macro BPF_JMP32_A is added to
generate the (BPF_JMP32 | BPF_JA) insn.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The next patch will add a ctx ptr saving instruction
"(r1 = *(u64 *)(r10 -8)" at the beginning for the main prog
when there is an epilogue patch (by the .gen_epilogue() verifier
ops added in the next patch).
There is one corner case if the bpf prog has a BPF_JMP that jumps
to the 1st instruction. It needs an adjustment such that
those BPF_JMP instructions won't jump to the newly added
ctx saving instruction.
The commit 5337ac4c9b80 ("bpf: Fix the corner case with may_goto and jump to the 1st insn.")
has the details on this case.
Note that the jump back to 1st instruction is not limited to the
ctx ptr saving instruction. The same also applies to the prologue.
A later test, pro_epilogue_goto_start.c, has a test for the prologue
only case.
Thus, this patch does one adjustment after gen_prologue and
the future ctx ptr saving. It is done by
adjust_jmp_off(env->prog, 0, delta) where delta has the total
number of instructions in the prologue and
the future ctx ptr saving instruction.
The adjust_jmp_off(env->prog, 0, delta) assumes that the
prologue does not have a goto 1st instruction itself.
To accommodate the prologue might have a goto 1st insn itself,
this patch changes the adjust_jmp_off() to skip considering
the instructions between [tgt_idx, tgt_idx + delta).
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch moves the 'struct bpf_insn insn_buf[16]' stack usage
to the bpf_verifier_env. A '#define INSN_BUF_SIZE 16' is also added
to replace the ARRAY_SIZE(insn_buf) usages.
Both convert_ctx_accesses() and do_misc_fixup() are changed
to use the env->insn_buf.
It is a refactoring work for adding the epilogue_buf[16] in a later patch.
With this patch, the stack size usage decreased.
Before:
./kernel/bpf/verifier.c:22133:5: warning: stack frame size (2584)
After:
./kernel/bpf/verifier.c:22184:5: warning: stack frame size (2264)
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Use kvmemdup instead of kvmalloc() + memcpy() to simplify the
code.
No functional change intended.
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20240828062128.1223417-1-lihongbo22@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In verifier.rst, there is a typo in section 'Register parentage chains'.
Caller saved registers are r0-r5, callee saved registers are r6-r9.
Here by context it means callee saved registers rather than caller saved
registers. This may confuse users.
Signed-off-by: Yiming Xiang <kxiang@umich.edu>
Link: https://lore.kernel.org/r/20240829031712.198489-1-kxiang@umich.edu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When dropping a local kptr, any kptr stashed into it is supposed to be
freed through bpf_obj_free_fields->__bpf_obj_drop_impl recursively. Add a
test to make sure it happens.
The test first stashes a referenced kptr to "struct task" into a local
kptr and gets the reference count of the task. Then, it drops the local
kptr and reads the reference count of the task again. Since
bpf_obj_free_fields and __bpf_obj_drop_impl will go through the local kptr
recursively during bpf_obj_drop, the dtor of the stashed task kptr should
eventually be called. The second reference count should be one less than
the first one.
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240827011301.608620-1-amery.hung@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
We do an ugly copying of options in bpf_object__open_skeleton() just to
be able to set object name from skeleton's recorded name (while still
allowing user to override it through opts->object_name).
This is not just ugly, but it also is broken due to memcpy() that
doesn't take into account potential skel_opts' and user-provided opts'
sizes differences due to backward and forward compatibility. This leads
to copying over extra bytes and then failing to validate options
properly. It could, technically, lead also to SIGSEGV, if we are unlucky.
So just get rid of that memory copy completely and instead pass
default object name into bpf_object_open() directly, simplifying all
this significantly. The rule now is that obj_name should be non-NULL for
bpf_object_open() when called with in-memory buffer, so validate that
explicitly as well.
We adopt bpf_object__open_mem() to this as well and generate default
name (based on buffer memory address and size) outside of bpf_object_open().
Fixes: d66562fba1ce ("libbpf: Add BPF object skeleton support")
Reported-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Daniel Müller <deso@posteo.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240827203721.1145494-1-andrii@kernel.org
|
|
KF_ACQUIRE kfuncs argument
This patch adds test cases for zero offset (implicit cast) or non-zero
offset pointer as KF_ACQUIRE kfuncs argument. Currently KF_ACQUIRE
kfuncs should support passing in pointers like &sk->sk_write_queue
(non-zero offset) or &sk->__sk_common (zero offset) and not be rejected
by the verifier.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB5848CB6F0D4D9068669A905B99952@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently we cannot pass zero offset (implicit cast) or non-zero offset
pointers to KF_ACQUIRE kfuncs. This is because KF_ACQUIRE kfuncs
requires strict type matching, but zero offset or non-zero offset does
not change the type of pointer, which causes the ebpf program to be
rejected by the verifier.
This can cause some problems, one example is that bpf_skb_peek_tail
kfunc [0] cannot be implemented by just passing in non-zero offset
pointers. We cannot pass pointers like &sk->sk_write_queue (non-zero
offset) or &sk->__sk_common (zero offset) to KF_ACQUIRE kfuncs.
This patch makes KF_ACQUIRE kfuncs not require strict type matching.
[0]: https://lore.kernel.org/bpf/AM6PR03MB5848CA39CB4B7A4397D380B099B12@AM6PR03MB5848.eurprd03.prod.outlook.com/
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB5848FD2BD89BF0B6B5AA3B4C99952@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Smatch reported the following warning:
./tools/testing/selftests/bpf/testing_helpers.c:455 get_xlated_program()
warn: variable dereferenced before check 'buf' (see line 454)
It seems correct,so let's modify it based on it's suggestion.
Actually,commit b23ed4d74c4d ("selftests/bpf: Fix invalid pointer
check in get_xlated_program()") fixed an issue in the test_verifier.c
once,but it was reverted this time.
Let's solve this issue with the minimal changes possible.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/1eb3732f-605a-479d-ba64-cd14250cbf91@stanley.mountain/
Fixes: b4b7a4099b8c ("selftests/bpf: Factor out get_xlated_program() helper")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Link: https://lore.kernel.org/r/20240820023622.29190-1-hao.ge@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Xu Kuohai says:
====================
bpf, arm64: Simplify jited prologue/epilogue
From: Xu Kuohai <xukuohai@huawei.com>
The arm64 jit blindly saves/restores all callee-saved registers, making
the jited result looks a bit too compliated. For example, for an empty
prog, the jited result is:
0: bti jc
4: mov x9, lr
8: nop
c: paciasp
10: stp fp, lr, [sp, #-16]!
14: mov fp, sp
18: stp x19, x20, [sp, #-16]!
1c: stp x21, x22, [sp, #-16]!
20: stp x26, x25, [sp, #-16]!
24: mov x26, #0
28: stp x26, x25, [sp, #-16]!
2c: mov x26, sp
30: stp x27, x28, [sp, #-16]!
34: mov x25, sp
38: bti j // tailcall target
3c: sub sp, sp, #0
40: mov x7, #0
44: add sp, sp, #0
48: ldp x27, x28, [sp], #16
4c: ldp x26, x25, [sp], #16
50: ldp x26, x25, [sp], #16
54: ldp x21, x22, [sp], #16
58: ldp x19, x20, [sp], #16
5c: ldp fp, lr, [sp], #16
60: mov x0, x7
64: autiasp
68: ret
Clearly, there is no need to save/restore unused callee-saved registers.
This patch does this change, making the jited image to only save/restore
the callee-saved registers it uses.
Now the jited result of empty prog is:
0: bti jc
4: mov x9, lr
8: nop
c: paciasp
10: stp fp, lr, [sp, #-16]!
14: mov fp, sp
18: stp xzr, x26, [sp, #-16]!
1c: mov x26, sp
20: bti j // tailcall target
24: mov x7, #0
28: ldp xzr, x26, [sp], #16
2c: ldp fp, lr, [sp], #16
30: mov x0, x7
34: autiasp
38: ret
====================
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240826071624.350108-1-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The arm64 jit blindly saves/restores all callee-saved registers, making
the jited result looks a bit too compliated. For example, for an empty
prog, the jited result is:
0: bti jc
4: mov x9, lr
8: nop
c: paciasp
10: stp fp, lr, [sp, #-16]!
14: mov fp, sp
18: stp x19, x20, [sp, #-16]!
1c: stp x21, x22, [sp, #-16]!
20: stp x26, x25, [sp, #-16]!
24: mov x26, #0
28: stp x26, x25, [sp, #-16]!
2c: mov x26, sp
30: stp x27, x28, [sp, #-16]!
34: mov x25, sp
38: bti j // tailcall target
3c: sub sp, sp, #0
40: mov x7, #0
44: add sp, sp, #0
48: ldp x27, x28, [sp], #16
4c: ldp x26, x25, [sp], #16
50: ldp x26, x25, [sp], #16
54: ldp x21, x22, [sp], #16
58: ldp x19, x20, [sp], #16
5c: ldp fp, lr, [sp], #16
60: mov x0, x7
64: autiasp
68: ret
Clearly, there is no need to save/restore unused callee-saved registers.
This patch does this change, making the jited image to only save/restore
the callee-saved registers it uses.
Now the jited result of empty prog is:
0: bti jc
4: mov x9, lr
8: nop
c: paciasp
10: stp fp, lr, [sp, #-16]!
14: mov fp, sp
18: stp xzr, x26, [sp, #-16]!
1c: mov x26, sp
20: bti j // tailcall target
24: mov x7, #0
28: ldp xzr, x26, [sp], #16
2c: ldp fp, lr, [sp], #16
30: mov x0, x7
34: autiasp
38: ret
Since bpf prog saves/restores its own callee-saved registers as needed,
to make tailcall work correctly, the caller needs to restore its saved
registers before tailcall, and the callee needs to save its callee-saved
registers after tailcall. This extra restoring/saving instructions
increases preformance overhead.
[1] provides 2 benchmarks for tailcall scenarios. Below is the perf
number measured in an arm64 KVM guest. The result indicates that the
performance difference before and after the patch in typical tailcall
scenarios is negligible.
- Before:
Performance counter stats for './test_progs -t tailcalls' (5 runs):
4313.43 msec task-clock # 0.874 CPUs utilized ( +- 0.16% )
574 context-switches # 133.073 /sec ( +- 1.14% )
0 cpu-migrations # 0.000 /sec
538 page-faults # 124.727 /sec ( +- 0.57% )
10697772784 cycles # 2.480 GHz ( +- 0.22% ) (61.19%)
25511241955 instructions # 2.38 insn per cycle ( +- 0.08% ) (66.70%)
5108910557 branches # 1.184 G/sec ( +- 0.08% ) (72.38%)
2800459 branch-misses # 0.05% of all branches ( +- 0.51% ) (72.36%)
TopDownL1 # 0.60 retiring ( +- 0.09% ) (66.84%)
# 0.21 frontend_bound ( +- 0.15% ) (61.31%)
# 0.12 bad_speculation ( +- 0.08% ) (50.11%)
# 0.07 backend_bound ( +- 0.16% ) (33.30%)
8274201819 L1-dcache-loads # 1.918 G/sec ( +- 0.18% ) (33.15%)
468268 L1-dcache-load-misses # 0.01% of all L1-dcache accesses ( +- 4.69% ) (33.16%)
385383 LLC-loads # 89.345 K/sec ( +- 5.22% ) (33.16%)
38296 LLC-load-misses # 9.94% of all LL-cache accesses ( +- 42.52% ) (38.69%)
6886576501 L1-icache-loads # 1.597 G/sec ( +- 0.35% ) (38.69%)
1848585 L1-icache-load-misses # 0.03% of all L1-icache accesses ( +- 4.52% ) (44.23%)
9043645883 dTLB-loads # 2.097 G/sec ( +- 0.10% ) (44.33%)
416672 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 5.15% ) (49.89%)
6925626111 iTLB-loads # 1.606 G/sec ( +- 0.35% ) (55.46%)
66220 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 1.88% ) (55.50%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
4.9372 +- 0.0526 seconds time elapsed ( +- 1.07% )
Performance counter stats for './test_progs -t flow_dissector' (5 runs):
10924.50 msec task-clock # 0.945 CPUs utilized ( +- 0.08% )
603 context-switches # 55.197 /sec ( +- 1.13% )
0 cpu-migrations # 0.000 /sec
566 page-faults # 51.810 /sec ( +- 0.42% )
27381270695 cycles # 2.506 GHz ( +- 0.18% ) (60.46%)
56996583922 instructions # 2.08 insn per cycle ( +- 0.21% ) (66.11%)
10321647567 branches # 944.816 M/sec ( +- 0.17% ) (71.79%)
3347735 branch-misses # 0.03% of all branches ( +- 3.72% ) (72.15%)
TopDownL1 # 0.52 retiring ( +- 0.13% ) (66.74%)
# 0.27 frontend_bound ( +- 0.14% ) (61.27%)
# 0.14 bad_speculation ( +- 0.19% ) (50.36%)
# 0.07 backend_bound ( +- 0.42% ) (33.89%)
18740797617 L1-dcache-loads # 1.715 G/sec ( +- 0.43% ) (33.71%)
13715669 L1-dcache-load-misses # 0.07% of all L1-dcache accesses ( +- 32.85% ) (33.34%)
4087551 LLC-loads # 374.164 K/sec ( +- 29.53% ) (33.26%)
267906 LLC-load-misses # 6.55% of all LL-cache accesses ( +- 23.90% ) (38.76%)
15811864229 L1-icache-loads # 1.447 G/sec ( +- 0.12% ) (38.73%)
2976833 L1-icache-load-misses # 0.02% of all L1-icache accesses ( +- 9.73% ) (44.22%)
20138907471 dTLB-loads # 1.843 G/sec ( +- 0.18% ) (44.15%)
732850 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 11.18% ) (49.64%)
15895726702 iTLB-loads # 1.455 G/sec ( +- 0.15% ) (55.13%)
152075 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 4.71% ) (54.98%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
11.5613 +- 0.0317 seconds time elapsed ( +- 0.27% )
- After:
Performance counter stats for './test_progs -t tailcalls' (5 runs):
4278.78 msec task-clock # 0.871 CPUs utilized ( +- 0.15% )
569 context-switches # 132.982 /sec ( +- 0.58% )
0 cpu-migrations # 0.000 /sec
539 page-faults # 125.970 /sec ( +- 0.43% )
10588986432 cycles # 2.475 GHz ( +- 0.20% ) (60.91%)
25303825043 instructions # 2.39 insn per cycle ( +- 0.08% ) (66.48%)
5110756256 branches # 1.194 G/sec ( +- 0.07% ) (72.03%)
2719569 branch-misses # 0.05% of all branches ( +- 2.42% ) (72.03%)
TopDownL1 # 0.60 retiring ( +- 0.22% ) (66.31%)
# 0.22 frontend_bound ( +- 0.21% ) (60.83%)
# 0.12 bad_speculation ( +- 0.26% ) (50.25%)
# 0.06 backend_bound ( +- 0.17% ) (33.52%)
8163648527 L1-dcache-loads # 1.908 G/sec ( +- 0.33% ) (33.52%)
694979 L1-dcache-load-misses # 0.01% of all L1-dcache accesses ( +- 30.53% ) (33.52%)
1902347 LLC-loads # 444.600 K/sec ( +- 48.84% ) (33.69%)
96677 LLC-load-misses # 5.08% of all LL-cache accesses ( +- 43.48% ) (39.30%)
6863517589 L1-icache-loads # 1.604 G/sec ( +- 0.37% ) (39.17%)
1871519 L1-icache-load-misses # 0.03% of all L1-icache accesses ( +- 6.78% ) (44.56%)
8927782813 dTLB-loads # 2.087 G/sec ( +- 0.14% ) (44.37%)
438237 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 6.00% ) (49.75%)
6886906831 iTLB-loads # 1.610 G/sec ( +- 0.36% ) (55.08%)
67568 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 3.27% ) (54.86%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
4.9114 +- 0.0309 seconds time elapsed ( +- 0.63% )
Performance counter stats for './test_progs -t flow_dissector' (5 runs):
10948.40 msec task-clock # 0.942 CPUs utilized ( +- 0.05% )
615 context-switches # 56.173 /sec ( +- 1.65% )
1 cpu-migrations # 0.091 /sec ( +- 31.62% )
567 page-faults # 51.788 /sec ( +- 0.44% )
27334194328 cycles # 2.497 GHz ( +- 0.08% ) (61.05%)
56656528828 instructions # 2.07 insn per cycle ( +- 0.08% ) (66.67%)
10270389422 branches # 938.072 M/sec ( +- 0.10% ) (72.21%)
3453837 branch-misses # 0.03% of all branches ( +- 3.75% ) (72.27%)
TopDownL1 # 0.52 retiring ( +- 0.16% ) (66.55%)
# 0.27 frontend_bound ( +- 0.09% ) (60.91%)
# 0.14 bad_speculation ( +- 0.08% ) (49.85%)
# 0.07 backend_bound ( +- 0.16% ) (33.33%)
18982866028 L1-dcache-loads # 1.734 G/sec ( +- 0.24% ) (33.34%)
8802454 L1-dcache-load-misses # 0.05% of all L1-dcache accesses ( +- 52.30% ) (33.31%)
2612962 LLC-loads # 238.661 K/sec ( +- 29.78% ) (33.45%)
264107 LLC-load-misses # 10.11% of all LL-cache accesses ( +- 18.34% ) (39.07%)
15793205997 L1-icache-loads # 1.443 G/sec ( +- 0.15% ) (39.09%)
3930802 L1-icache-load-misses # 0.02% of all L1-icache accesses ( +- 3.72% ) (44.66%)
20097828496 dTLB-loads # 1.836 G/sec ( +- 0.09% ) (44.68%)
961757 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 3.32% ) (50.15%)
15838728506 iTLB-loads # 1.447 G/sec ( +- 0.09% ) (55.62%)
167652 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 1.28% ) (55.52%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
11.6173 +- 0.0268 seconds time elapsed ( +- 0.23% )
[1] https://lore.kernel.org/bpf/20200724123644.5096-1-maciej.fijalkowski@intel.com/
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20240826071624.350108-3-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpf prog accesses stack using BPF_FP as the base address and a negative
immediate number as offset. But arm64 ldr/str instructions only support
non-negative immediate number as offset. To simplify the jited result,
commit 5b3d19b9bd40 ("bpf, arm64: Adjust the offset of str/ldr(immediate)
to positive number") introduced FPB to represent the lowest stack address
that the bpf prog being jited may access, and with this address as the
baseline, it converts BPF_FP plus negative immediate offset number to FPB
plus non-negative immediate offset.
Considering that for a given bpf prog, the jited stack space is fixed
with A64_SP as the lowest address and BPF_FP as the highest address.
Thus we can get rid of FPB and converts BPF_FP plus negative immediate
offset to A64_SP plus non-negative immediate offset.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20240826071624.350108-2-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
commit 7bd230a26648 ("mm/slab: enable slab allocation tagging for kmalloc
and friends") [1] swap kmem_cache_alloc_node() to
kmem_cache_alloc_node_noprof().
linux/samples/bpf$ sudo ./tracex4
libbpf: prog 'bpf_prog2': failed to create kretprobe
'kmem_cache_alloc_node+0x0' perf event: No such file or directory
ERROR: bpf_program__attach failed
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://github.com/torvalds/linux/commit/7bd230a26648ac68ab3731ebbc449090f0ac6a37
Link: https://lore.kernel.org/bpf/tencent_34E5BCCAC5ABF3E81222AD81B1D05F16DE06@qq.com
|
|
This adds tests for both the happy path and
the error path.
Signed-off-by: Jordan Rome <linux@jordanrome.com>
Link: https://lore.kernel.org/r/20240823195101.3621028-2-linux@jordanrome.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This adds a kfunc wrapper around strncpy_from_user,
which can be called from sleepable BPF programs.
This matches the non-sleepable 'bpf_probe_read_user_str'
helper except it includes an additional 'flags'
param, which allows consumers to clear the entire
destination buffer on success or failure.
Signed-off-by: Jordan Rome <linux@jordanrome.com>
Link: https://lore.kernel.org/r/20240823195101.3621028-1-linux@jordanrome.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Save pkg-config output for libpcap as simply-expanded variables.
For an obscure reason 'shell' call in LDLIBS/CFLAGS recursively
expanded variables makes *.test.o files compilation non-parallel
when make is executed with -j option.
While at it, reuse 'pkg-config --cflags' call to define
-DTRAFFIC_MONITOR=1 option, it's exit status is the same as for
'pkg-config --exists'.
Fixes: f52403b6bfea ("selftests/bpf: Add traffic monitor functions.")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240823194409.774815-1-eddyz87@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Amery Hung says:
====================
Support bpf_kptr_xchg into local kptr
This revision adds substaintial changes to patch 2 to support structures
with kptr as the only special btf type. The test is split into
local_kptr_stash and task_kfunc_success to remove dependencies on
bpf_testmod that would break veristat results.
This series allows stashing kptr into local kptr. Currently, kptrs are
only allowed to be stashed into map value with bpf_kptr_xchg(). A
motivating use case of this series is to enable adding referenced kptr to
bpf_rbtree or bpf_list by using allocated object as graph node and the
storage of referenced kptr. For example, a bpf qdisc [0] enqueuing a
referenced kptr to a struct sk_buff* to a bpf_list serving as a fifo:
struct skb_node {
struct sk_buff __kptr *skb;
struct bpf_list_node node;
};
private(A) struct bpf_spin_lock fifo_lock;
private(A) struct bpf_list_head fifo __contains(skb_node, node);
/* In Qdisc_ops.enqueue */
struct skb_node *skbn;
skbn = bpf_obj_new(typeof(*skbn));
if (!skbn)
goto drop;
/* skb is a referenced kptr to struct sk_buff acquired earilier
* but not shown in this code snippet.
*/
skb = bpf_kptr_xchg(&skbn->skb, skb);
if (skb)
/* should not happen; do something below releasing skb to
* satisfy the verifier */
...
bpf_spin_lock(&fifo_lock);
bpf_list_push_back(&fifo, &skbn->node);
bpf_spin_unlock(&fifo_lock);
The implementation first searches for BPF_KPTR when generating program
BTF. Then, we teach the verifier that the detination argument of
bpf_kptr_xchg() can be local kptr, and use the btf_record in program BTF
to check against the source argument.
This series is mostly developed by Dave, who kindly helped and sent me
the patchset. The selftests in bpf qdisc (WIP) relies on this series to
work.
[0] https://lore.kernel.org/netdev/20240714175130.4051012-10-amery.hung@bytedance.com/
---
v3 -> v4
- Allow struct in prog btf w/ kptr as the only special field type
- Split tests of stashing referenced kptr and local kptr
- v3: https://lore.kernel.org/bpf/20240809005131.3916464-1-amery.hung@bytedance.com/
v2 -> v3
- Fix prog btf memory leak
- Test stashing kptr in prog btf
- Test unstashing kptrs after stashing into local kptrs
- v2: https://lore.kernel.org/bpf/20240803001145.635887-1-amery.hung@bytedance.com/
v1 -> v2
- Fix the document for bpf_kptr_xchg()
- Add a comment explaining changes in the verifier
- v1: https://lore.kernel.org/bpf/20240728030115.3970543-1-amery.hung@bytedance.com/
====================
Link: https://lore.kernel.org/r/20240813212424.2871455-1-amery.hung@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Test stashing both referenced kptr and local kptr into local kptrs. Then,
test unstashing them.
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Link: https://lore.kernel.org/r/20240813212424.2871455-6-amery.hung@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|