linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-01-04	selftests/bpf: Test re-attachment fix for bpf_tracing_prog_attach	Dmitrii Dolgov
	Add a test case to verify the fix for "prog->aux->dst_trampoline and tgt_prog is NULL" branch in bpf_tracing_prog_attach. The sequence of events: 1. load rawtp program 2. load fentry program with rawtp as target_fd 3. create tracing link for fentry program with target_fd = 0 4. repeat 3 Acked-by: Jiri Olsa <olsajiri@gmail.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com> Link: https://lore.kernel.org/r/20240103190559.14750-5-9erthalion6@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04	selftests/bpf: Add test for recursive attachment of tracing progs	Dmitrii Dolgov
	Verify the fact that only one fentry prog could be attached to another fentry, building up an attachment chain of limited size. Use existing bpf_testmod as a start of the chain. Acked-by: Jiri Olsa <olsajiri@gmail.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com> Link: https://lore.kernel.org/r/20240103190559.14750-3-9erthalion6@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04	selftests/bpf: Test gotol with large offsets	Ilya Leoshkevich
	Test gotol with offsets that don't fit into a short (i.e., larger than 32k or smaller than -32k). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20240102193531.3169422-4-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-03	selftests/bpf: add arg:ctx cases to test_global_funcs tests	Andrii Nakryiko
	Add a few extra cases of global funcs with context arguments. This time rely on "arg:ctx" decl_tag (__arg_ctx macro), but put it next to "classic" cases where context argument has to be of an exact type that BPF verifier expects (e.g., bpf_user_pt_regs_t for kprobe/uprobe). Colocating all these cases separately from other global func args that rely on arg:xxx decl tags (in verifier_global_subprogs.c) allows for simpler backwards compatibility testing on old kernels. All the cases in test_global_func_ctx_args.c are supposed to work on older kernels, which was manually validated during development. Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240104013847.3875810-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-03	selftests/bpf: Add a selftest with > 512-byte percpu allocation size	Yonghong Song
	Add a selftest to capture the verification failure when the allocation size is greater than 512. Acked-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231222031812.1293190-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-03	selftests/bpf: Cope with 512 bytes limit with bpf_global_percpu_ma	Yonghong Song
	In the previous patch, the maximum data size for bpf_global_percpu_ma is 512 bytes. This breaks selftest test_bpf_ma. The test is adjusted in two aspects: - Since the maximum allowed data size for bpf_global_percpu_ma is 512, remove all tests beyond that, names sizes 1024, 2048 and 4096. - Previously the percpu data size is bucket_size - 8 in order to avoid percpu allocation into the next bucket. This patch removed such data size adjustment thanks to Patch 1. Also, a better way to generate BTF type is used than adding a member to the value struct. Acked-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231222031807.1292853-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-03	selftests/bpf: Convert profiler.c to bpf_cmp.	Alexei Starovoitov
	Convert profiler[123].c to "volatile compare" to compare barrier_var() approach vs bpf_cmp_likely() vs bpf_cmp_unlikely(). bpf_cmp_unlikely() produces correct code, but takes much longer to verify: ./veristat -C -e prog,insns,states before after_with_unlikely Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ------------------------------------ --------- --------- ------------------ ---------- ---------- ----------------- kprobe__proc_sys_write 1603 19606 +18003 (+1123.08%) 123 1678 +1555 (+1264.23%) kprobe__vfs_link 11815 70305 +58490 (+495.05%) 971 4967 +3996 (+411.53%) kprobe__vfs_symlink 5464 42896 +37432 (+685.07%) 434 3126 +2692 (+620.28%) kprobe_ret__do_filp_open 5641 44578 +38937 (+690.25%) 446 3162 +2716 (+608.97%) raw_tracepoint__sched_process_exec 2770 35962 +33192 (+1198.27%) 226 3121 +2895 (+1280.97%) raw_tracepoint__sched_process_exit 1526 2135 +609 (+39.91%) 133 208 +75 (+56.39%) raw_tracepoint__sched_process_fork 265 337 +72 (+27.17%) 19 24 +5 (+26.32%) tracepoint__syscalls__sys_enter_kill 18782 140407 +121625 (+647.56%) 1286 12176 +10890 (+846.81%) bpf_cmp_likely() is equivalent to barrier_var(): ./veristat -C -e prog,insns,states before after_with_likely Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ------------------------------------ --------- --------- -------------- ---------- ---------- ------------- kprobe__proc_sys_write 1603 1663 +60 (+3.74%) 123 127 +4 (+3.25%) kprobe__vfs_link 11815 12090 +275 (+2.33%) 971 971 +0 (+0.00%) kprobe__vfs_symlink 5464 5448 -16 (-0.29%) 434 426 -8 (-1.84%) kprobe_ret__do_filp_open 5641 5739 +98 (+1.74%) 446 446 +0 (+0.00%) raw_tracepoint__sched_process_exec 2770 2608 -162 (-5.85%) 226 216 -10 (-4.42%) raw_tracepoint__sched_process_exit 1526 1526 +0 (+0.00%) 133 133 +0 (+0.00%) raw_tracepoint__sched_process_fork 265 265 +0 (+0.00%) 19 19 +0 (+0.00%) tracepoint__syscalls__sys_enter_kill 18782 18970 +188 (+1.00%) 1286 1286 +0 (+0.00%) kprobe__proc_sys_write 2700 2809 +109 (+4.04%) 107 109 +2 (+1.87%) kprobe__vfs_link 12238 12366 +128 (+1.05%) 267 269 +2 (+0.75%) kprobe__vfs_symlink 7139 7365 +226 (+3.17%) 167 175 +8 (+4.79%) kprobe_ret__do_filp_open 7264 7070 -194 (-2.67%) 180 182 +2 (+1.11%) raw_tracepoint__sched_process_exec 3768 3453 -315 (-8.36%) 211 199 -12 (-5.69%) raw_tracepoint__sched_process_exit 3138 3138 +0 (+0.00%) 83 83 +0 (+0.00%) raw_tracepoint__sched_process_fork 265 265 +0 (+0.00%) 19 19 +0 (+0.00%) tracepoint__syscalls__sys_enter_kill 26679 24327 -2352 (-8.82%) 1067 1037 -30 (-2.81%) kprobe__proc_sys_write 1833 1833 +0 (+0.00%) 157 157 +0 (+0.00%) kprobe__vfs_link 9995 10127 +132 (+1.32%) 803 803 +0 (+0.00%) kprobe__vfs_symlink 5606 5672 +66 (+1.18%) 451 451 +0 (+0.00%) kprobe_ret__do_filp_open 5716 5782 +66 (+1.15%) 462 462 +0 (+0.00%) raw_tracepoint__sched_process_exec 3042 3042 +0 (+0.00%) 278 278 +0 (+0.00%) raw_tracepoint__sched_process_exit 1680 1680 +0 (+0.00%) 146 146 +0 (+0.00%) raw_tracepoint__sched_process_fork 299 299 +0 (+0.00%) 25 25 +0 (+0.00%) tracepoint__syscalls__sys_enter_kill 18372 18372 +0 (+0.00%) 1558 1558 +0 (+0.00%) default (mcpu=v3), no_alu32, cpuv4 have similar differences. Note one place where bpf_nop_mov() is used to workaround the verifier lack of link between the scalar register and its spill to stack. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231226191148.48536-7-alexei.starovoitov@gmail.com
2024-01-03	selftests/bpf: Convert exceptions_assert.c to bpf_cmp	Alexei Starovoitov
	Convert exceptions_assert.c to bpf_cmp_unlikely() macro. Since bpf_assert(bpf_cmp_unlikely(var, ==, 100)); other code; will generate assembly code: if r1 == 100 goto L2; r0 = 0 call bpf_throw L1: other code; ... L2: goto L1; LLVM generates redundant basic block with extra goto. LLVM will be fixed eventually. Right now it's less efficient than __bpf_assert(var, ==, 100) macro that produces: if r1 == 100 goto L1; r0 = 0 call bpf_throw L1: other code; But extra goto doesn't hurt the verification process. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20231226191148.48536-4-alexei.starovoitov@gmail.com
2024-01-03	bpf: Introduce "volatile compare" macros	Alexei Starovoitov
	Compilers optimize conditional operators at will, but often bpf programmers want to force compilers to keep the same operator in asm as it's written in C. Introduce bpf_cmp_likely/unlikely(var1, conditional_op, var2) macros that can be used as: - if (seen >= 1000) + if (bpf_cmp_unlikely(seen, >=, 1000)) The macros take advantage of BPF assembly that is C like. The macros check the sign of variable 'seen' and emits either signed or unsigned compare. For example: int a; bpf_cmp_unlikely(a, >, 0) will be translated to 'if rX s> 0 goto' in BPF assembly. unsigned int a; bpf_cmp_unlikely(a, >, 0) will be translated to 'if rX > 0 goto' in BPF assembly. C type conversions coupled with comparison operator are tricky. int i = -1; unsigned int j = 1; if (i < j) // this is false. long i = -1; unsigned int j = 1; if (i < j) // this is true. Make sure BPF program is compiled with -Wsign-compare then the macros will catch the mistake. The macros check LHS (left hand side) only to figure out the sign of compare. 'if 0 < rX goto' is not allowed in the assembly, so the users have to use a variable on LHS anyway. The patch updates few tests to demonstrate the use of the macros. The macro allows to use BPF_JSET in C code, since LLVM doesn't generate it at present. For example: if (i & j) compiles into r0 &= r1; if r0 == 0 goto while if (bpf_cmp_unlikely(i, &, j)) compiles into if r0 & r1 goto Note that the macros has to be careful with RHS assembly predicate. Since: u64 __rhs = 1ull << 42; asm goto("if r0 < %[rhs] goto +1" :: [rhs] "ri" (__rhs)); LLVM will silently truncate 64-bit constant into s32 imm. Note that [lhs] "r"((short)LHS) the type cast is a workaround for LLVM issue. When LHS is exactly 32-bit LLVM emits redundant <<=32, >>=32 to zero upper 32-bits. When LHS is 64 or 16 or 8-bit variable there are no shifts. When LHS is 32-bit the (u64) cast doesn't help. Hence use (short) cast. It does _not_ truncate the variable before it's assigned to a register. Traditional likely()/unlikely() macros that use __builtin_expect(!!(x), 1 or 0) have no effect on these macros, hence macros implement the logic manually. bpf_cmp_unlikely() macro preserves compare operator as-is while bpf_cmp_likely() macro flips the compare. Consider two cases: A. for() { if (foo >= 10) { bar += foo; } other code; } B. for() { if (foo >= 10) break; other code; } It's ok to use either bpf_cmp_likely or bpf_cmp_unlikely macros in both cases, but consider that 'break' is effectively 'goto out_of_the_loop'. Hence it's better to use bpf_cmp_unlikely in the B case. While 'bar += foo' is better to keep as 'fallthrough' == likely code path in the A case. When it's written as: A. for() { if (bpf_cmp_likely(foo, >=, 10)) { bar += foo; } other code; } B. for() { if (bpf_cmp_unlikely(foo, >=, 10)) break; other code; } The assembly will look like: A. for() { if r1 < 10 goto L1; bar += foo; L1: other code; } B. for() { if r1 >= 10 goto L2; other code; } L2: The bpf_cmp_likely vs bpf_cmp_unlikely changes basic block layout, hence it will greatly influence the verification process. The number of processed instructions will be different, since the verifier walks the fallthrough first. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20231226191148.48536-3-alexei.starovoitov@gmail.com
2024-01-03	selftests/bpf: Attempt to build BPF programs with -Wsign-compare	Alexei Starovoitov
	GCC's -Wall includes -Wsign-compare while clang does not. Since BPF programs are built with clang we need to add this flag explicitly to catch problematic comparisons like: int i = -1; unsigned int j = 1; if (i < j) // this is false. long i = -1; unsigned int j = 1; if (i < j) // this is true. C standard for reference: - If either operand is unsigned long the other shall be converted to unsigned long. - Otherwise, if one operand is a long int and the other unsigned int, then if a long int can represent all the values of an unsigned int, the unsigned int shall be converted to a long int; otherwise both operands shall be converted to unsigned long int. - Otherwise, if either operand is long, the other shall be converted to long. - Otherwise, if either operand is unsigned, the other shall be converted to unsigned. Unfortunately clang's -Wsign-compare is very noisy. It complains about (s32)a == (u32)b which is safe and doen't have surprising behavior. This patch fixes some of the issues. It needs a follow up to fix the rest. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20231226191148.48536-2-alexei.starovoitov@gmail.com
2024-01-03	bpf: Add a possibly-zero-sized read test	Andrei Matei
	This patch adds a test for the condition that the previous patch mucked with - illegal zero-sized helper memory access. As opposed to existing tests, this new one uses a size whose lower bound is zero, as opposed to a known-zero one. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231221232225.568730-3-andreimatei1@gmail.com
2024-01-03	bpf: Simplify checking size of helper accesses	Andrei Matei
	This patch simplifies the verification of size arguments associated to pointer arguments to helpers and kfuncs. Many helpers take a pointer argument followed by the size of the memory access performed to be performed through that pointer. Before this patch, the handling of the size argument in check_mem_size_reg() was confusing and wasteful: if the size register's lower bound was 0, then the verification was done twice: once considering the size of the access to be the lower-bound of the respective argument, and once considering the upper bound (even if the two are the same). The upper bound checking is a super-set of the lower-bound checking(), except: the only point of the lower-bound check is to handle the case where zero-sized-accesses are explicitly not allowed and the lower-bound is zero. This static condition is now checked explicitly, replacing a much more complex, expensive and confusing verification call to check_helper_mem_access(). Error messages change in this patch. Before, messages about illegal zero-size accesses depended on the type of the pointer and on other conditions, and sometimes the message was plain wrong: in some tests that changed you'll see that the old message was something like "R1 min value is outside of the allowed memory range", where R1 is the pointer register; the error was wrongly claiming that the pointer was bad instead of the size being bad. Other times the information that the size came for a register with a possible range of values was wrong, and the error presented the size as a fixed zero. Now the errors refer to the right register. However, the old error messages did contain useful information about the pointer register which is now lost; recovering this information was deemed not important enough. () Besides standing to reason that the checks for a bigger size access are a super-set of the checks for a smaller size access, I have also mechanically verified this by reading the code for all types of pointers. I could convince myself that it's true for all but PTR_TO_BTF_ID (check_ptr_to_btf_access). There, simply looking line-by-line does not immediately prove what we want. If anyone has any qualms, let me know. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231221232225.568730-2-andreimatei1@gmail.com
2023-12-20	selftests/bpf: Remove tests for zeroed-array kptr	Hou Tao
	bpf_mem_alloc() doesn't support zero-sized allocation, so removing these tests from test_bpf_ma test. After the removal, there will no definition for bin_data_8, so remove 8 from data_sizes array and adjust the index of data_btf_ids array in all test cases accordingly. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231216131052.27621-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19	selftests/bpf: add freplace of BTF-unreliable main prog test	Andrii Nakryiko
	Add a test validating that freplace'ing another main (entry) BPF program fails if the target BPF program doesn't have valid/expected func proto BTF. We extend fexit_bpf2bpf test to allow to specify expected log message for negative test cases (where freplace program is expected to fail to load). Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-11-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19	selftests/bpf: add global subprog annotation tests	Andrii Nakryiko
	Add test cases to validate semantics of global subprog argument annotations: - non-null pointers; - context argument; - const dynptr passing; - packet pointers (data, metadata, end). Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-10-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19	bpf: reuse subprog argument parsing logic for subprog call checks	Andrii Nakryiko
	Remove duplicated BTF parsing logic when it comes to subprog call check. Instead, use (potentially cached) results of btf_prepare_func_args() to abstract away expectations of each subprog argument in generic terms (e.g., "this is pointer to context", or "this is a pointer to memory of size X"), and then use those simple high-level argument type expectations to validate actual register states to check if they match expectations. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19	bpf: reuse btf_prepare_func_args() check for main program BTF validation	Andrii Nakryiko
	Instead of btf_check_subprog_arg_match(), use btf_prepare_func_args() logic to validate "trustworthiness" of main BPF program's BTF information, if it is present. We ignored results of original BTF check anyway, often times producing confusing and ominously-sounding "reg type unsupported for arg#0 function" message, which has no apparent effect on program correctness and verification process. All the -EFAULT returning sanity checks are already performed in check_btf_info_early(), so there is zero reason to have this duplication of logic between btf_check_subprog_call() and btf_check_subprog_arg_match(). Dropping btf_check_subprog_arg_match() simplifies btf_check_func_arg_match() further removing `bool processing_call` flag. One subtle bit that was done by btf_check_subprog_arg_match() was potentially marking main program's BTF as unreliable. We do this explicitly now with a dedicated simple check, preserving the original behavior, but now based on well factored btf_prepare_func_args() logic. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19	selftests/bpf: add testcase to verifier_bounds.c for BPF_JNE	Menglong Dong
	Add testcase for the logic that the verifier tracks the BPF_JNE for regs. The assembly function "reg_not_equal_const()" and "reg_equal_const" that we add is exactly converted from the following case: u32 a = bpf_get_prandom_u32(); u64 b = 0; a %= 8; /* the "a > 0" here will be optimized to "a != 0" / if (a > 0) { / now the range of a should be [1, 7] */ bpf_skb_store_bytes(skb, 0, &b, a, 0); } Signed-off-by: Menglong Dong <menglong8.dong@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231219134800.1550388-5-menglong8.dong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19	Revert BPF token-related functionality	Andrii Nakryiko
	This patch includes the following revert (one conflicting BPF FS patch and three token patch sets, represented by merge commits): - revert 0f5d5454c723 "Merge branch 'bpf-fs-mount-options-parsing-follow-ups'"; - revert 750e785796bb "bpf: Support uid and gid when mounting bpffs"; - revert 733763285acf "Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'"; - revert c35919dcce28 "Merge branch 'bpf-token-and-bpf-fs-based-delegation'". Link: https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-12-18	Merge tag 'for-netdev' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2023-12-18 This PR is larger than usual and contains changes in various parts of the kernel. The main changes are: 1) Fix kCFI bugs in BPF, from Peter Zijlstra. End result: all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y. 2) Introduce BPF token object, from Andrii Nakryiko. It adds an ability to delegate a subset of BPF features from privileged daemon (e.g., systemd) through special mount options for userns-bound BPF FS to a trusted unprivileged application. The design accommodates suggestions from Christian Brauner and Paul Moore. Example: $ sudo mkdir -p /sys/fs/bpf/token $ sudo mount -t bpf bpffs /sys/fs/bpf/token \ -o delegate_cmds=prog_load:MAP_CREATE \ -o delegate_progs=kprobe \ -o delegate_attachs=xdp 3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei. - Complete precision tracking support for register spills - Fix verification of possibly-zero-sized stack accesses - Fix access to uninit stack slots - Track aligned STACK_ZERO cases as imprecise spilled registers. It improves the verifier "instructions processed" metric from single digit to 50-60% for some programs. - Fix verifier retval logic 4) Support for VLAN tag in XDP hints, from Larysa Zaremba. 5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu. End result: better memory utilization and lower I$ miss for calls to BPF via BPF trampoline. 6) Fix race between BPF prog accessing inner map and parallel delete, from Hou Tao. 7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu. It allows BPF interact with IPSEC infra. The intent is to support software RSS (via XDP) for the upcoming ipsec pcpu work. Experiments on AWS demonstrate single tunnel pcpu ipsec reaching line rate on 100G ENA nics. 8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao. 9) BPF file verification via fsverity, from Song Liu. It allows BPF progs get fsverity digest. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits) bpf: Ensure precise is reset to false in __mark_reg_const_zero() selftests/bpf: Add more uprobe multi fail tests bpf: Fail uprobe multi link with negative offset selftests/bpf: Test the release of map btf s390/bpf: Fix indirect trampoline generation selftests/bpf: Temporarily disable dummy_struct_ops test on s390 x86/cfi,bpf: Fix bpf_exception_cb() signature bpf: Fix dtor CFI cfi: Add CFI_NOSEAL() x86/cfi,bpf: Fix bpf_struct_ops CFI x86/cfi,bpf: Fix bpf_callback_t CFI x86/cfi,bpf: Fix BPF JIT call cfi: Flip headers selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment bpf: Limit the number of kprobes when attaching program to multiple kprobes bpf: Limit the number of uprobes when attaching program to multiple uprobes bpf: xdp: Register generic_kfunc_set with XDP programs selftests/bpf: utilize string values for delegate_xxx mount options ... ==================== Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18	bpf: Ensure precise is reset to false in __mark_reg_const_zero()	Andrii Nakryiko
	It is safe to always start with imprecise SCALAR_VALUE register. Previously __mark_reg_const_zero() relied on caller to reset precise mark, but it's very error prone and we already missed it in a few places. So instead make __mark_reg_const_zero() reset precision always, as it's a safe default for SCALAR_VALUE. Explanation is basically the same as for why we are resetting (or rather not setting) precision in current state. If necessary, precision propagation will set it to precise correctly. As such, also remove a big comment about forward precision propagation in mark_reg_stack_read() and avoid unnecessarily setting precision to true after reading from STACK_ZERO stack. Again, precision propagation will correctly handle this, if that SCALAR_VALUE register will ever be needed to be precise. Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231218173601.53047-1-andrii@kernel.org
2023-12-18	selftests/bpf: Test the release of map btf	Hou Tao
	When there is bpf_list_head or bpf_rb_root field in map value, the free of map btf and the free of map value may run concurrently and there may be use-after-free problem, so add two test cases to demonstrate it. And the use-after-free problem can been easily reproduced by using bpf_next tree and a KASAN-enabled kernel. The first test case tests the racing between the free of map btf and the free of array map. It constructs the racing by releasing the array map in the end after other ref-counter of map btf has been released. To delay the free of array map and make it be invoked after btf_free_rcu() is invoked, it stresses system_unbound_wq by closing multiple percpu array maps before it closes the array map. The second case tests the racing between the free of map btf and the free of inner map. Beside using the similar method as the first one does, it uses bpf_map_delete_elem() to delete the inner map and to defer the release of inner map after one RCU grace period. The reason for using two skeletons is to prevent the release of outer map and inner map in map_in_map_btf.c interfering the release of bpf map in normal_map_btf.c. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20231216035510.4030605-1-houtao@huaweicloud.com
2023-12-14	bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()	Daniel Xu
	This commit extends test_tunnel selftest to test the new XDP xfrm state lookup kfunc. Co-developed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/e704e9a4332e3eac7b458e4bfdec8fcc6984cdb6.1702593901.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-14	bpf: selftests: Move xfrm tunnel test to test_progs	Daniel Xu
	test_progs is better than a shell script b/c C is a bit easier to maintain than shell. Also it's easier to use new infra like memory mapped global variables from C via bpf skeleton. Co-developed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/a350db9e08520c64544562d88ec005a039124d9b.1702593901.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-14	bpf: selftests: test_tunnel: Use vmlinux.h declarations	Daniel Xu
	vmlinux.h declarations are more ergnomic, especially when working with kfuncs. The uapi headers are often incomplete for kfunc definitions. This commit also switches bitfield accesses to use CO-RE helpers. Switching to vmlinux.h definitions makes the verifier very unhappy with raw bitfield accesses. The error is: ; md.u.md2.dir = direction; 33: (69) r1 = (u16 )(r2 +11) misaligned stack access off (0x0; 0x0)+-64+11 size 2 Fix by using CO-RE-aware bitfield reads and writes. Co-developed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/884bde1d9a351d126a3923886b945ea6b1b0776b.1702593901.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	selftests/bpf: Check VLAN tag and proto in xdp_metadata	Larysa Zaremba
	Verify, whether VLAN tag and proto are set correctly. To simulate "stripped" VLAN tag on veth, send test packet from VLAN interface. Also, add TO_STR() macro for convenience. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-19-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	selftests/bpf: Add flags and VLAN hint to xdp_hw_metadata	Larysa Zaremba
	Add VLAN hint to the xdp_hw_metadata program. Also, to make metadata layout more straightforward, add flags field to pass information about validity of every separate hint separately. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-17-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	selftests/bpf: Allow VLAN packets in xdp_hw_metadata	Larysa Zaremba
	Make VLAN c-tag and s-tag XDP hint testing more convenient by not skipping VLAN-ed packets. Allow both 802.1ad and 802.1Q headers. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-16-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	selftests/bpf: add BPF object loading tests with explicit token passing	Andrii Nakryiko
	Add a few tests that attempt to load BPF object containing privileged map, program, and the one requiring mandatory BTF uploading into the kernel (to validate token FD propagation to BPF_BTF_LOAD command). Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231213190842.3844987-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	bpf: selftests: Add verifier tests for CO-RE bitfield writes	Daniel Xu
	Add some tests that exercise BPF_CORE_WRITE_BITFIELD() macro. Since some non-trivial bit fiddling is going on, make sure various edge cases (such as adjacent bitfields and bitfields at the edge of structs) are exercised. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/72698a1080fa565f541d5654705255984ea2a029.1702325874.git.dxu@dxuuu.xyz Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-13	bpf: selftests: test_loader: Support __btf_path() annotation	Daniel Xu
	This commit adds support for per-prog btf_custom_path. This is necessary for testing CO-RE relocations on non-vmlinux types using test_loader infrastructure. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/660ea7f2fdbdd5103bc1af87c9fc931f05327926.1702325874.git.dxu@dxuuu.xyz Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-11	selftests/bpf: validate eliminated global subprog is not freplaceable	Andrii Nakryiko
	Add selftest that establishes dead code-eliminated valid global subprog (global_dead) and makes sure that it's not possible to freplace it, as it's effectively not there. This test will fail with unexpected success before 2afae08c9dcb ("bpf: Validate global subprogs lazily"). v2->v3: - add missing err assignment (Alan); - undo unnecessary signature changes in verifier_global_subprogs.c (Eduard); v1->v2: - don't rely on assembly output in verifier log, which changes between compiler versions (CI). Acked-by: Eduard Zingerman <eddyz87@gmail.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20231211174131.2324306-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09	selftests/bpf: Add test for bpf_cpumask_weight() kfunc	David Vernet
	The new bpf_cpumask_weight() kfunc can be used to count the number of bits that are set in a struct cpumask* kptr. Let's add a selftest to verify its behavior. Signed-off-by: David Vernet <void@manifault.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231207210843.168466-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09	selftests/bpf: validate fake register spill/fill precision backtracking logic	Andrii Nakryiko
	Add two tests validating that verifier's precision backtracking logic handles BPF_ST_MEM instructions that produce fake register spill into register slot. This is happening when non-zero constant is written directly to a slot, e.g., (u64 )(r10 -8) = 123. Add both full 64-bit register spill, as well as 32-bit "sub-spill". Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231209010958.66758-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-08	selftests/bpf: Add selftests for cgroup1 local storage	Yafang Shao
	Expanding the test coverage from cgroup2 to include cgroup1. The result as follows, Already existing test cases for cgroup2: #48/1 cgrp_local_storage/tp_btf:OK #48/2 cgrp_local_storage/attach_cgroup:OK #48/3 cgrp_local_storage/recursion:OK #48/4 cgrp_local_storage/negative:OK #48/5 cgrp_local_storage/cgroup_iter_sleepable:OK #48/6 cgrp_local_storage/yes_rcu_lock:OK #48/7 cgrp_local_storage/no_rcu_lock:OK Expanded test cases for cgroup1: #48/8 cgrp_local_storage/cgrp1_tp_btf:OK #48/9 cgrp_local_storage/cgrp1_recursion:OK #48/10 cgrp_local_storage/cgrp1_negative:OK #48/11 cgrp_local_storage/cgrp1_iter_sleepable:OK #48/12 cgrp_local_storage/cgrp1_yes_rcu_lock:OK #48/13 cgrp_local_storage/cgrp1_no_rcu_lock:OK Summary: #48 cgrp_local_storage:OK Summary: 1/13 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231206115326.4295-4-laoar.shao@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-08	selftests/bpf: fix timer/test_bad_ret subtest on test_progs-cpuv4 flavor	Andrii Nakryiko
	Because test_bad_ret main program is not written in assembly, we don't control instruction indices in timer_cb_ret_bad() subprog. This bites us in timer/test_bad_ret subtest, where we see difference between cpuv4 and other flavors. For now, make __msg() expectations not rely on instruction indices by anchoring them around bpf_get_prandom_u32 call. Once we have regex/glob support for __msg(), this can be expressed a bit more nicely, but for now just mitigating the problem with available means. Fixes: e02dea158dda ("selftests/bpf: validate async callback return value check correctness") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231208233028.3412690-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-08	bpf: Fix accesses to uninit stack slots	Andrei Matei
	Privileged programs are supposed to be able to read uninitialized stack memory (ever since 6715df8d5) but, before this patch, these accesses were permitted inconsistently. In particular, accesses were permitted above state->allocated_stack, but not below it. In other words, if the stack was already "large enough", the access was permitted, but otherwise the access was rejected instead of being allowed to "grow the stack". This undesired rejection was happening in two places: - in check_stack_slot_within_bounds() - in check_stack_range_initialized() This patch arranges for these accesses to be permitted. A bunch of tests that were relying on the old rejection had to change; all of them were changed to add also run unprivileged, in which case the old behavior persists. One tests couldn't be updated - global_func16 - because it can't run unprivileged for other reasons. This patch also fixes the tracking of the stack size for variable-offset reads. This second fix is bundled in the same commit as the first one because they're inter-related. Before this patch, writes to the stack using registers containing a variable offset (as opposed to registers with fixed, known values) were not properly contributing to the function's needed stack size. As a result, it was possible for a program to verify, but then to attempt to read out-of-bounds data at runtime because a too small stack had been allocated for it. Each function tracks the size of the stack it needs in bpf_subprog_info.stack_depth, which is maintained by update_stack_depth(). For regular memory accesses, check_mem_access() was calling update_state_depth() but it was passing in only the fixed part of the offset register, ignoring the variable offset. This was incorrect; the minimum possible value of that register should be used instead. This tracking is now fixed by centralizing the tracking of stack size in grow_stack_state(), and by lifting the calls to grow_stack_state() to check_stack_access_within_bounds() as suggested by Andrii. The code is now simpler and more convincingly tracks the correct maximum stack size. check_stack_range_initialized() can now rely on enough stack having been allocated for the access; this helps with the fix for the first issue. A few tests were changed to also check the stack depth computation. The one that fails without this patch is verifier_var_off:stack_write_priv_vs_unpriv. Fixes: 01f810ace9ed3 ("bpf: Allow variable-offset stack access") Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231208032519.260451-3-andreimatei1@gmail.com Closes: https://lore.kernel.org/bpf/CABWLsev9g8UP_c3a=1qbuZUi20tGoUXoU07FPf-5FLvhOKOY+Q@mail.gmail.com/
2023-12-07	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/stmicro/stmmac/dwmac5.c drivers/net/ethernet/stmicro/stmmac/dwmac5.h drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c drivers/net/ethernet/stmicro/stmmac/hwif.h 37e4b8df27bc ("net: stmmac: fix FPE events losing") c3f3b97238f6 ("net: stmmac: Refactor EST implementation") https://lore.kernel.org/all/20231206110306.01e91114@canb.auug.org.au/ Adjacent changes: net/ipv4/tcp_ao.c 9396c4ee93f9 ("net/tcp: Don't store TCP-AO maclen on reqsk") 7b0f570f879a ("tcp: Move TCP-AO bits from cookie_v[46]_check() to tcp_ao_syncookie().") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07	bpf: Add verifier regression test for previous patch	Andrei Matei
	Add a regression test for var-off zero-sized reads. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20231207041150.229139-3-andreimatei1@gmail.com
2023-12-06	selftests/bpf: Add test for early update in prog_array_map_poke_run	Jiri Olsa
	Adding test that tries to trigger the BUG_IN during early map update in prog_array_map_poke_run function. The idea is to share prog array map between thread that constantly updates it and another one loading a program that uses that prog array. Eventually we will hit a place where the program is ok to be updated (poke->tailcall_target_stable check) but the address is still not registered in kallsyms, so the bpf_arch_text_poke returns -EINVAL and cause imbalance for the next tail call update check, which will fail with -EBUSY in bpf_arch_text_poke as described in previous fix. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/bpf/20231206083041.1306660-3-jolsa@kernel.org
2023-12-05	selftests/bpf: validate precision logic in partial_stack_load_preserves_zeros	Andrii Nakryiko
	Enhance partial_stack_load_preserves_zeros subtest with detailed precision propagation log checks. We know expect fp-16 to be spilled, initially imprecise, zero const register, which is later marked as precise even when partial stack slot load is performed, even if it's not a register fill (!). Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231205184248.1502704-10-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-05	selftests/bpf: validate zero preservation for sub-slot loads	Andrii Nakryiko
	Validate that 1-, 2-, and 4-byte loads from stack slots not aligned on 8-byte boundary still preserve zero, when loading from all-STACK_ZERO sub-slots, or when stack sub-slots are covered by spilled register with known constant zero value. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231205184248.1502704-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-05	selftests/bpf: validate STACK_ZERO is preserved on subreg spill	Andrii Nakryiko
	Add tests validating that STACK_ZERO slots are preserved when slot is partially overwritten with subregister spill. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231205184248.1502704-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-05	selftests/bpf: add stack access precision test	Andrii Nakryiko
	Add a new selftests that validates precision tracking for stack access instruction, using both r10-based and non-r10-based accesses. For non-r10 ones we also make sure to have non-zero var_off to validate that final stack offset is tracked properly in instruction history information inside verifier. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231205184248.1502704-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-05	bpf: support non-r10 register spill/fill to/from stack in precision tracking	Andrii Nakryiko
	Use instruction (jump) history to record instructions that performed register spill/fill to/from stack, regardless if this was done through read-only r10 register, or any other register after copying r10 into it and potentially adjusting offset. To make this work reliably, we push extra per-instruction flags into instruction history, encoding stack slot index (spi) and stack frame number in extra 10 bit flags we take away from prev_idx in instruction history. We don't touch idx field for maximum performance, as it's checked most frequently during backtracking. This change removes basically the last remaining practical limitation of precision backtracking logic in BPF verifier. It fixes known deficiencies, but also opens up new opportunities to reduce number of verified states, explored in the subsequent patches. There are only three differences in selftests' BPF object files according to veristat, all in the positive direction (less states). File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) -------------------------------------- ------------- --------- --------- ------------- ---------- ---------- ------------- test_cls_redirect_dynptr.bpf.linked3.o cls_redirect 2987 2864 -123 (-4.12%) 240 231 -9 (-3.75%) xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82848 82661 -187 (-0.23%) 5107 5073 -34 (-0.67%) xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 85116 84964 -152 (-0.18%) 5162 5130 -32 (-0.62%) Note, I avoided renaming jmp_history to more generic insn_hist to minimize number of lines changed and potential merge conflicts between bpf and bpf-next trees. Notice also cur_hist_entry pointer reset to NULL at the beginning of instruction verification loop. This pointer avoids the problem of relying on last jump history entry's insn_idx to determine whether we already have entry for current instruction or not. It can happen that we added jump history entry because current instruction is_jmp_point(), but also we need to add instruction flags for stack access. In this case, we don't want to entries, so we need to reuse last added entry, if it is present. Relying on insn_idx comparison has the same ambiguity problem as the one that was fixed recently in [0], so we avoid that. [0] https://patchwork.kernel.org/project/netdevbpf/patch/20231110002638.4168352-3-andrii@kernel.org/ Acked-by: Eduard Zingerman <eddyz87@gmail.com> Reported-by: Tao Lyu <tao.lyu@epfl.ch> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231205184248.1502704-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-05	selftests/bpf: Make sure we trigger metadata kfuncs for dst 8080	Stanislav Fomichev
	xdp_metadata test is flaky sometimes: verify_xsk_metadata:FAIL:rx_hash_type unexpected rx_hash_type: actual 8 != expected 0 Where 8 means XDP_RSS_TYPE_L4_ANY and is exported from veth driver only when 'skb->l4_hash' condition is met. This makes me think that the program is triggering again for some other packet. Let's have a filter, similar to xdp_hw_metadata, where we trigger XDP kfuncs only for UDP packets destined to port 8080. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231204174423.3460052-1-sdf@google.com
2023-12-05	selftests/bpf: Test bpf_kptr_xchg stashing of bpf_rb_root	Dave Marchevsky
	There was some confusion amongst Meta sched_ext folks regarding whether stashing bpf_rb_root - the tree itself, rather than a single node - was supported. This patch adds a small test which demonstrates this functionality: a local kptr with rb_root is created, a node is created and added to the tree, then the tree is kptr_xchg'd into a mapval. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20231204211722.571346-1-davemarchevsky@fb.com
2023-12-04	selftests/bpf: Test outer map update operations in syscall program	Hou Tao
	Syscall program is running with rcu_read_lock_trace being held, so if bpf_map_update_elem() or bpf_map_delete_elem() invokes synchronize_rcu_tasks_trace() when operating on an outer map, there will be dead-lock, so add a test to guarantee that it is dead-lock free. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-8-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-04	selftests/bpf: Add test cases for inner map	Hou Tao
	Add test cases to test the race between the destroy of inner map due to map-in-map update and the access of inner map in bpf program. The following 4 combinations are added: (1) array map in map array + bpf program (2) array map in map array + sleepable bpf program (3) array map in map htab + bpf program (4) array map in map htab + sleepable bpf program Before applying the fixes, when running `./test_prog -a map_in_map`, the following error was reported: ================================================================== BUG: KASAN: slab-use-after-free in array_map_update_elem+0x48/0x3e0 Read of size 4 at addr ffff888114f33824 by task test_progs/1858 CPU: 1 PID: 1858 Comm: test_progs Tainted: G O 6.6.0+ #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... Call Trace: <TASK> dump_stack_lvl+0x4a/0x90 print_report+0xd2/0x620 kasan_report+0xd1/0x110 __asan_load4+0x81/0xa0 array_map_update_elem+0x48/0x3e0 bpf_prog_be94a9f26772f5b7_access_map_in_array+0xe6/0xf6 trace_call_bpf+0x1aa/0x580 kprobe_perf_func+0xdd/0x430 kprobe_dispatcher+0xa0/0xb0 kprobe_ftrace_handler+0x18b/0x2e0 0xffffffffc02280f7 RIP: 0010:__x64_sys_getpgid+0x1/0x30 ...... </TASK> Allocated by task 1857: kasan_save_stack+0x26/0x50 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x30 __kasan_kmalloc+0x98/0xa0 __kmalloc_node+0x6a/0x150 __bpf_map_area_alloc+0x141/0x170 bpf_map_area_alloc+0x10/0x20 array_map_alloc+0x11f/0x310 map_create+0x28a/0xb40 __sys_bpf+0x753/0x37c0 __x64_sys_bpf+0x44/0x60 do_syscall_64+0x36/0xb0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Freed by task 11: kasan_save_stack+0x26/0x50 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x50 __kasan_slab_free+0x113/0x190 slab_free_freelist_hook+0xd7/0x1e0 __kmem_cache_free+0x170/0x260 kfree+0x9b/0x160 kvfree+0x2d/0x40 bpf_map_area_free+0xe/0x20 array_map_free+0x120/0x2c0 bpf_map_free_deferred+0xd7/0x1e0 process_one_work+0x462/0x990 worker_thread+0x370/0x670 kthread+0x1b0/0x200 ret_from_fork+0x3a/0x70 ret_from_fork_asm+0x1b/0x30 Last potentially related work creation: kasan_save_stack+0x26/0x50 __kasan_record_aux_stack+0x94/0xb0 kasan_record_aux_stack_noalloc+0xb/0x20 __queue_work+0x331/0x950 queue_work_on+0x75/0x80 bpf_map_put+0xfa/0x160 bpf_map_fd_put_ptr+0xe/0x20 bpf_fd_array_map_update_elem+0x174/0x1b0 bpf_map_update_value+0x2b7/0x4a0 __sys_bpf+0x2551/0x37c0 __x64_sys_bpf+0x44/0x60 do_syscall_64+0x36/0xb0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-7-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-02	bpf: simplify tnum output if a fully known constant	Andrii Nakryiko
	Emit tnum representation as just a constant if all bits are known. Use decimal-vs-hex logic to determine exact format of emitted constant value, just like it's done for register range values. For that move tnum_strn() to kernel/bpf/log.c to reuse decimal-vs-hex determination logic and constants. Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231202175705.885270-12-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>