linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-04-01	doc: netlink: Add hyperlinks to generated Netlink docs	Donald Hunter
	Update ynl-gen-rst to generate hyperlinks to definitions, attribute sets and sub-messages from all the places that reference them. Note that there is a single label namespace for all of the kernel docs. Hyperlinks within a single netlink doc need to be qualified by the family name to avoid collisions. The label format is 'family-type-name' which gives, for example, 'rt-link-attribute-set-link-attrs' as the link id. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240329135021.52534-3-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	doc: netlink: Change generated docs to limit TOC to depth 3	Donald Hunter
	The tables of contents in the generated Netlink docs include individual attribute definitions. This can make the contents exceedingly long and repeats a lot of what is on the rest of the pages. See for example: https://docs.kernel.org/networking/netlink_spec/tc.html Add a depth limit to the contents directive in generated .rst files to limit the contents depth to 3 levels. This reduces the contents to: - Family - Summary - Operations - op-one - op-two - ... - Definitions - struct-one - struct-two - enum-one - ... - Attribute sets - attrs-one - attrs-two - ... Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240329135021.52534-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	selftests: mptcp: join: fix dev in check_endpoint	Geliang Tang
	There's a bug in pm_nl_check_endpoint(), 'dev' didn't be parsed correctly. If calling it in the 2nd test of endpoint_tests() too, it fails with an error like this: creation [FAIL] expected '10.0.2.2 id 2 subflow dev dev' \ found '10.0.2.2 id 2 subflow dev ns2eth2' The reason is '$2' should be set to 'dev', not '$1'. This patch fixes it. Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-2-324a8981da48@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	mptcp: don't account accept() of non-MPC client as fallback to TCP	Davide Caratti
	Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they accept non-MPC connections. As reported by Christoph, this is "surprising" because the counter might become greater than MPTcpExtMPCapableSYNRX. MPTcpExtMPCapableFallbackACK counter's name suggests it should only be incremented when a connection was seen using MPTCP options, then a fallback to TCP has been done. Let's do that by incrementing it when the subflow context of an inbound MPC connection attempt is dropped. Also, update mptcp_connect.sh kselftest, to ensure that the above MIB does not increment in case a pure TCP client connects to a MPTCP server. Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	selftests: reuseaddr_conflict: add missing new line at the end of the output	Jakub Kicinski
	The netdev CI runs in a VM and captures serial, so stdout and stderr get combined. Because there's a missing new line in stderr the test ends up corrupting KTAP: # Successok 1 selftests: net: reuseaddr_conflict which should have been: # Success ok 1 selftests: net: reuseaddr_conflict Fixes: 422d8dc6fd3a ("selftest: add a reuseaddr test") Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240329160559.249476-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	bitmap: introduce generic optimized bitmap_size()	Alexander Lobakin
	The number of times yet another open coded `BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge. Some generic helper is long overdue. Add one, bitmap_size(), but with one detail. BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both divident and divisor are compile-time constants or when the divisor is not a pow-of-2. When it is however, the compilers sometimes tend to generate suboptimal code (GCC 13): 48 83 c0 3f add $0x3f,%rax 48 c1 e8 06 shr $0x6,%rax 48 8d 14 c5 00 00 00 00 lea 0x0(,%rax,8),%rdx %BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does full division of `nbits + 63` by it and then multiplication by 8. Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC: 8d 50 3f lea 0x3f(%rax),%edx c1 ea 03 shr $0x3,%edx 81 e2 f8 ff ff 1f and $0x1ffffff8,%edx Now it shifts `nbits + 63` by 3 positions (IOW performs fast division by 8) and then masks bits[2:0]. bloat-o-meter: add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617) Clang does it better and generates the same code before/after starting from -O1, except that with the ALIGN() approach it uses %edx and thus still saves some bytes: add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520) Note that we can't expand DIV_ROUND_UP() by adding a check and using this approach there, as it's used in array declarations where expressions are not allowed. Add this helper to tools/ as well. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	tools: move alignment-related macros to new <linux/align.h>	Alexander Lobakin
	Currently, tools have ALIGN() macros scattered across the unrelated headers, as there are only 3 of them and they were added separately each time on an as-needed basis. Anyway, let's make it more consistent with the kernel headers and allow using those macros outside of the mentioned headers. Create <linux/align.h> inside the tools/ folder and include it where needed. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	bitops: make BYTES_TO_BITS() treewide-available	Alexander Lobakin
	Avoid open-coding that simple expression each time by moving BYTES_TO_BITS() from the probes code to <linux/bitops.h> to export it to the rest of the kernel. Simplify the macro while at it. `BITS_PER_LONG / sizeof(long)` always equals to %BITS_PER_BYTE, regardless of the target architecture. Do the same for the tools ecosystem as well (incl. its version of bitops.h). The previous implementation had its implicit type of long, while the new one is int, so adjust the format literal accordingly in the perf code. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-30	objtool: Fix compile failure when using the x32 compiler	Mikulas Patocka
	When compiling the v6.9-rc1 kernel with the x32 compiler, the following errors are reported. The reason is that we take an "unsigned long" variable and print it using "PRIx64" format string. In file included from check.c:16: check.c: In function ‘add_dead_ends’: /usr/src/git/linux-2.6/tools/objtool/include/objtool/warn.h:46:17: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘long unsigned int’ [-Werror=format=] 46 \| "%s: warning: objtool: " format "\n", \ \| ^~~~~~~~~~~~~~~~~~~~~~~~ check.c:613:33: note: in expansion of macro ‘WARN’ 613 \| WARN("can't find unreachable insn at %s+0x%" PRIx64, \| ^~~~ ... Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: linux-kernel@vger.kernel.org
2024-03-29	selftests/bpf: make multi-uprobe tests work in RELEASE=1 mode	Andrii Nakryiko
	When BPF selftests are built in RELEASE=1 mode with -O2 optimization level, uprobe_multi binary, called from multi-uprobe tests is optimized to the point that all the thousands of target uprobe_multi_func_XXX functions are eliminated, breaking tests. So ensure they are preserved by using weak attribute. But, actually, compiling uprobe_multi binary with -O2 takes a really long time, and is quite useless (it's not a benchmark). So in addition to ensuring that uprobe_multi_func_XXX functions are preserved, opt-out of -O2 explicitly in Makefile and stick to -O0. This saves a lot of compilation time. With -O2, just recompiling uprobe_multi: $ touch uprobe_multi.c $ time make RELEASE=1 -j90 make RELEASE=1 -j90 291.66s user 2.54s system 99% cpu 4:55.52 total With -O0: $ touch uprobe_multi.c $ time make RELEASE=1 -j90 make RELEASE=1 -j90 22.40s user 1.91s system 99% cpu 24.355 total 5 minutes vs (still slow, but...) 24 seconds. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240329190410.4191353-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-29	Merge tag 'linux_kselftest-fixes-6.9-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "Fixes to seccomp and ftrace tests and a change to add config file for dmabuf-heap test to increase coverage" * tag 'linux_kselftest-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: dmabuf-heap: add config file for the test selftests/seccomp: Try to fit runtime of benchmark into timeout selftests/ftrace: Fix event filter target_func selection
2024-03-29	Merge tag 'linux_kselftest-kunit-fixes-6.9-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit fixes from Shuah Khan: "One urgent fix for --alltests build failure related to renaming of CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED to the missing config option" * tag 'linux_kselftest-kunit-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests
2024-03-29	selftest: tcp: Add bind() tests for SO_REUSEADDR/SO_REUSEPORT.	Kuniyuki Iwashima
	This patch adds two tests using SO_REUSEADDR and SO_REUSEPORT and defines errno for each test case. SO_REUSEADDR/SO_REUSEPORT is set for the per-fixture two bind() calls. The notable pattern is the pair of v6only [::] and plain [::]. The two sockets are put into the same tb2, where per-bucket v6only flag would be useless to detect bind() conflict. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: tcp: Add bind() tests for IPV6_V6ONLY.	Kuniyuki Iwashima
	bhash2 was not well tested for IPv6-only sockets. This patch adds test cases where we set IPV6_V6ONLY for per-fixture bind() calls if variant->ipv6_only[i] is true. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: tcp: Add more bind() calls.	Kuniyuki Iwashima
	In addtition to the two addresses defined in the fixtures, this patch add 6 more bind calls(): * 0.0.0.0 * 127.0.0.1 * :: * ::1 * ::ffff:0.0.0.0 * ::ffff:127.0.0.1 The first two per-fixture bind() calls control how inet_bind2_bucket is created, and the rest 6 bind() calls cover as many conflicting patterns as possible. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: tcp: Add v4-v4 and v6-v6 bind() conflict tests.	Kuniyuki Iwashima
	We don't have bind() conflict tests for the same protocol pairs. Let's add them except for the same address pair, which will be covered by the following patch adding 6 more bind() calls for each test case. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: tcp: Define the reverse order bind() tests explicitly.	Kuniyuki Iwashima
	Currently, bind_wildcard.c calls bind() twice for two addresses and checks the pre-defined errno against the 2nd call. Also, the two bind() calls are swapped to cover various patterns how bind buckets are created. However, only testing two addresses is insufficient to detect regression. So, we will add more bind() calls, and then, we need to define different errno for each bind() per test case. As a prepartion, let's define the reverse order bind() test cases as fixtures. No functional changes are intended. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: tcp: Make bind() selftest flexible.	Kuniyuki Iwashima
	Currently, bind_wildcard.c tests only (IPv4, IPv6) pairs, but we will add more tests for the same protocol pairs. This patch makes it possible by changing the address pointer to void. No functional changes are intended. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftests: dmabuf-heap: add config file for the test	Muhammad Usama Anjum
	The config fragment enlists all the config options needed for the test. This config is merged into the kernel's config on which this test is run. Fixed whitespace errors during commit: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: T.J. Mercier <tjmercier@google.com> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-03-29	selftests/seccomp: Try to fit runtime of benchmark into timeout	Mark Brown
	The seccomp benchmark runs five scenarios, one calibration run with no seccomp filters enabled then four further runs each adding a filter. The calibration run times itself for 15s and then each additional run executes for the same number of times. Currently the seccomp tests, including the benchmark, run with an extended 120s timeout but this is not sufficient to robustly run the tests on a lot of platforms. Sample timings from some recent runs: Platform Run 1 Run 2 Run 3 Run 4 --------- ----- ----- ----- ----- PowerEdge R200 16.6s 16.6s 31.6s 37.4s BBB (arm) 20.4s 20.4s 54.5s Synquacer (arm64) 20.7s 23.7s 40.3s The x86 runs from the PowerEdge are quite marginal and routinely fail, for the successful run reported here the timed portions of the run are at 117.2s leaving less than 3s of margin which is frequently breached. The added overhead of adding filters on the other platforms is such that there is no prospect of their runs fitting into the 120s timeout, especially on 32 bit arm where there is no BPF JIT. While we could lower the time we calibrate for I'm also already seeing the currently completing runs reporting issues with the per filter overheads not matching expectations: Let's instead raise the timeout to 180s which is only a 50% increase on the current timeout which is itself not too large given that there's only two tests in this suite. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-03-29	selftests/ftrace: Fix event filter target_func selection	Mark Rutland
	The event filter function test has been failing in our internal test farm: \| # not ok 33 event filter function - test event filtering on functions Running the test in verbose mode indicates that this is because the test erroneously determines that kmem_cache_free() is the most common caller of kmem_cache_free(): # # + cut -d: -f3 trace # # + sed s/call_site=([^+])+0x./1/ # # + sort # # + uniq -c # # + sort # # + tail -n 1 # # + sed s/^[ 0-9]// # # + target_func=kmem_cache_free ... and as kmem_cache_free() doesn't call itself, setting this as the filter function for kmem_cache_free() results in no hits, and consequently the test fails: # # + grep kmem_cache_free trace # # + grep kmem_cache_free # # + wc -l # # + hitcnt=0 # # + grep kmem_cache_free trace # # + grep -v kmem_cache_free # # + wc -l # # + misscnt=0 # # + [ 0 -eq 0 ] # # + exit_fail This seems to be because the system in question has tasks with ':' in their name (which a number of kernel worker threads have). These show up in the trace, e.g. test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache ... and so when we try to extact the call_site with: cut -d: -f3 trace \| sed 's/call_site=$[^+]$+0x.*/\1/' ... the 'cut' command will extrace the column containing 'kmem_cache_free' rather than the column containing 'call_site=...', and the 'sed' command will leave this unchanged. Consequently, the test will decide to use 'kmem_cache_free' as the filter function, resulting in the failure seen above. Fix this by matching the 'call_site=<func>' part specifically to extract the function name. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-kernel@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-trace-kernel@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-03-29	tools/net/ynl: Add extack policy attribute decoding	Donald Hunter
	The NLMSGERR_ATTR_POLICY extack attribute has been ignored by ynl up to now. Extend extack decoding to include _POLICY and the nested NL_POLICY_TYPE_ATTR_* attributes. For example: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/rt_link.yaml \ --create --do newlink --json '{ "ifname": "12345678901234567890", "linkinfo": {"kind": "bridge"} }' Netlink error: Numerical result out of range nl_len = 104 (88) nl_flags = 0x300 nl_type = 2 error: -34 extack: {'msg': 'Attribute failed policy validation', 'policy': {'max-length': 15, 'type': 'string'}, 'bad-attr': '.ifname'} Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240328155636.64688-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: af_unix: Test GC for SCM_RIGHTS.	Kuniyuki Iwashima
	This patch adds test cases to verify the new GC. We run each test for the following cases: * SOCK_DGRAM * SOCK_STREAM without embryo socket * SOCK_STREAM without embryo socket + MSG_OOB * SOCK_STREAM with embryo sockets * SOCK_STREAM with embryo sockets + MSG_OOB Before and after running each test case, we ensure that there is no AF_UNIX socket left in the netns by reading /proc/net/protocols. We cannot use /proc/net/unix and UNIX_DIAG because the embryo socket does not show up there. Each test creates multiple sockets in an array. We pass sockets in the even index using the peer sockets in the odd index. So, send_fd(0, 1) actually sends fd[0] to fd[2] via fd[0 + 1]. Test 1 : A <-> A Test 2 : A <-> B Test 3 : A -> B -> C <- D ^.___\|___.' ^ `---------' Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-16-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftests: net: gro fwd: update vxlan GRO test expectations	Antoine Tenart
	UDP tunnel packets can't be GRO in-between their endpoints as this causes different issues. The UDP GRO fwd vxlan tests were relying on this and their expectations have to be fixed. We keep both vxlan tests and expected no GRO from happening. The vxlan UDP GRO bench test was removed as it's not providing any valuable information now. Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests") Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-29	x86/selftests: Skip the tests if prerequisites aren't fulfilled	Muhammad Usama Anjum
	Skip instead of failing when prerequisite conditions aren't fulfilled, such as invalid xstate values etc. Make the tests show as 'SKIP' when run: make -C tools/testing/selftests/ TARGETS=x86 run_tests ... # timeout set to 45 # selftests: x86: amx_64 # # xstate cpuid: invalid tile data size/offset: 0/0 ok 42 selftests: x86: amx_64 # SKIP # timeout set to 45 # selftests: x86: lam_64 # # Unsupported LAM feature! ok 43 selftests: x86: lam_64 # SKIP ... In the AMX test, Move away from check_cpuid_xsave() and start using arch_prctl() to find out if AMX support is present or not. In the kernels where AMX isn't present, arch_prctl() returns -EINVAL, hence it is backward compatible. Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20240327111720.3509180-1-usama.anjum@collabora.com
2024-03-28	selftests/bpf: Drop settimeo in do_test	Geliang Tang
	settimeo is invoked in start_server() and in connect_fd_to_fd() already, no need to invoke settimeo(lfd, 0) and settimeo(fd, 0) in do_test() anymore. This patch drops them. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/dbc3613bee3b1c78f95ac9ff468bf47c92f106ea.1711447102.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-28	selftests/bpf: Use connect_fd_to_fd in bpf_tcp_ca	Geliang Tang
	To simplify the code, use BPF selftests helper connect_fd_to_fd() in bpf_tcp_ca.c instead of open-coding it. This helper is defined in network_helpers.c, and exported in network_helpers.h, which is already included in bpf_tcp_ca.c. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/e105d1f225c643bee838409378dd90fd9aabb6dc.1711447102.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-28	tools/resolve_btfids: fix build with musl libc	Natanael Copa
	Include the header that defines u32. This fixes build of 6.6.23 and 6.1.83 kernels for Alpine Linux, which uses musl libc. I assume that GNU libc indirecly pulls in linux/types.h. Fixes: 9707ac4fe2f5 ("tools/resolve_btfids: Refactor set sorting with types from btf_ids.h") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218647 Cc: stable@vger.kernel.org Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Tested-by: Greg Thelen <gthelen@google.com> Link: https://lore.kernel.org/r/20240328110103.28734-1-ncopa@alpinelinux.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Add a kprobe_multi subtest to use addrs instead of syms	Yonghong Song
	Get addrs directly from available_filter_functions_addrs and send to the kernel during kprobe_multi_attach. This avoids consultation of /proc/kallsyms. But available_filter_functions_addrs is introduced in 6.5, i.e., it is introduced recently, so I skip the test if the kernel does not support it. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041523.1200301-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Fix kprobe_multi_bench_attach test failure with LTO kernel	Yonghong Song
	In my locally build clang LTO kernel (enabling CONFIG_LTO and CONFIG_LTO_CLANG_THIN), kprobe_multi_bench_attach/kernel subtest failed like: test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec libbpf: prog 'test_kprobe_empty': failed to attach: No such process test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -3 #117/1 kprobe_multi_bench_attach/kernel:FAIL There are multiple symbols in /sys/kernel/debug/tracing/available_filter_functions are renamed in /proc/kallsyms due to cross file inlining. One example is for static function __access_remote_vm in mm/memory.c. In a non-LTO kernel, we have the following call stack: ptrace_access_vm (global, kernel/ptrace.c) access_remote_vm (global, mm/memory.c) __access_remote_vm (static, mm/memory.c) With LTO kernel, it is possible that access_remote_vm() is inlined by ptrace_access_vm(). So we end up with the following call stack: ptrace_access_vm (global, kernel/ptrace.c) __access_remote_vm (static, mm/memory.c) The compiler renames __access_remote_vm to __access_remote_vm.llvm.<hash> to prevent potential name collision. The kernel bpf_kprobe_multi_link_attach() and ftrace_lookup_symbols() try to find addresses based on /proc/kallsyms, hence the current test failed with LTO kenrel. This patch consulted /proc/kallsyms to find the corresponding entries for the ksym and this solved the issue. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041518.1199758-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Add {load,search}_kallsyms_custom_local()	Yonghong Song
	These two functions allow selftests to do loading/searching kallsyms based on their specific compare functions. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041513.1199440-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Refactor trace helper func load_kallsyms_local()	Yonghong Song
	Refactor trace helper function load_kallsyms_local() such that it invokes a common function with a compare function as input. The common function will be used later for other local functions. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041508.1199239-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Refactor some functions for kprobe_multi_test	Yonghong Song
	Refactor some functions in kprobe_multi_test.c to extract some helper functions who will be used in later patches to avoid code duplication. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041503.1198982-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	libbpf: Handle <orig_name>.llvm.<hash> symbol properly	Yonghong Song
	With CONFIG_LTO_CLANG_THIN enabled, with some of previous version of kernel code base ([1]), I hit the following error: test_ksyms:PASS:kallsyms_fopen 0 nsec test_ksyms:FAIL:ksym_find symbol 'bpf_link_fops' not found #118 ksyms:FAIL The reason is that 'bpf_link_fops' is renamed to bpf_link_fops.llvm.8325593422554671469 Due to cross-file inlining, the static variable 'bpf_link_fops' in syscall.c is used by a function in another file. To avoid potential duplicated names, the llvm added suffix '.llvm.<hash>' ([2]) to 'bpf_link_fops' variable. Such renaming caused a problem in libbpf if 'bpf_link_fops' is used in bpf prog as a ksym but 'bpf_link_fops' does not match any symbol in /proc/kallsyms. To fix this issue, libbpf needs to understand that suffix '.llvm.<hash>' is caused by clang lto kernel and to process such symbols properly. With latest bpf-next code base built with CONFIG_LTO_CLANG_THIN, I cannot reproduce the above failure any more. But such an issue could happen with other symbols or in the future for bpf_link_fops symbol. For example, with my current kernel, I got the following from /proc/kallsyms: ffffffff84782154 d __func__.net_ratelimit.llvm.6135436931166841955 ffffffff85f0a500 d tk_core.llvm.726630847145216431 ffffffff85fdb960 d __fs_reclaim_map.llvm.10487989720912350772 ffffffff864c7300 d fake_dst_ops.llvm.54750082607048300 I could not easily create a selftest to test newly-added libbpf functionality with a static C test since I do not know which symbol is cross-file inlined. But based on my particular kernel, the following test change can run successfully. > diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms.c b/tools/testing/selftests/bpf/prog_tests/ksyms.c > index 6a86d1f07800..904a103f7b1d 100644 > --- a/tools/testing/selftests/bpf/prog_tests/ksyms.c > +++ b/tools/testing/selftests/bpf/prog_tests/ksyms.c > @@ -42,6 +42,7 @@ void test_ksyms(void) > ASSERT_EQ(data->out__bpf_link_fops, link_fops_addr, "bpf_link_fops"); > ASSERT_EQ(data->out__bpf_link_fops1, 0, "bpf_link_fops1"); > ASSERT_EQ(data->out__btf_size, btf_size, "btf_size"); > + ASSERT_NEQ(data->out__fake_dst_ops, 0, "fake_dst_ops"); > ASSERT_EQ(data->out__per_cpu_start, per_cpu_start_addr, "__per_cpu_start"); > > cleanup: > diff --git a/tools/testing/selftests/bpf/progs/test_ksyms.c b/tools/testing/selftests/bpf/progs/test_ksyms.c > index 6c9cbb5a3bdf..fe91eef54b66 100644 > --- a/tools/testing/selftests/bpf/progs/test_ksyms.c > +++ b/tools/testing/selftests/bpf/progs/test_ksyms.c > @@ -9,11 +9,13 @@ __u64 out__bpf_link_fops = -1; > __u64 out__bpf_link_fops1 = -1; > __u64 out__btf_size = -1; > __u64 out__per_cpu_start = -1; > +__u64 out__fake_dst_ops = -1; > > extern const void bpf_link_fops __ksym; > extern const void __start_BTF __ksym; > extern const void __stop_BTF __ksym; > extern const void __per_cpu_start __ksym; > +extern const void fake_dst_ops __ksym; > /* non-existing symbol, weak, default to zero / > extern const void bpf_link_fops1 __ksym __weak; > > @@ -23,6 +25,7 @@ int handler(const void ctx) > out__bpf_link_fops = (__u64)&bpf_link_fops; > out__btf_size = (__u64)(&__stop_BTF - &__start_BTF); > out__per_cpu_start = (__u64)&__per_cpu_start; > + out__fake_dst_ops = (__u64)&fake_dst_ops; > > out__bpf_link_fops1 = (__u64)&bpf_link_fops1; This patch fixed the issue in libbpf such that the suffix '.llvm.<hash>' will be ignored during comparison of bpf prog ksym vs. symbols in /proc/kallsyms, this resolved the issue. Currently, only static variables in /proc/kallsyms are checked with '.llvm.<hash>' suffix since in bpf programs function ksyms with '.llvm.<hash>' suffix are most likely kfunc's and unlikely to be cross-file inlined. Note that currently kernel does not support gcc build with lto. [1] https://lore.kernel.org/bpf/20240302165017.1627295-1-yonghong.song@linux.dev/ [2] https://github.com/llvm/llvm-project/blob/release/18.x/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1714-L1719 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041458.1198161-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	libbpf: Mark libbpf_kallsyms_parse static function	Yonghong Song
	Currently libbpf_kallsyms_parse() function is declared as a global function but actually it is not a API and there is no external users in bpftool/bpf-selftests. So let us mark the function as static. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041453.1197949-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Replace CHECK with ASSERT macros for ksyms test	Yonghong Song
	Replace CHECK with ASSERT macros for ksyms tests. This test failed earlier with clang lto kernel, but the issue is gone with latest code base. But replacing CHECK with ASSERT still improves code as ASSERT is preferred in selftests. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041448.1197812-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Test loading bpf-tcp-cc prog calling the kernel tcp-cc kfuncs	Martin KaFai Lau
	This patch adds a test to ensure all static tcp-cc kfuncs is visible to the struct_ops bpf programs. It is checked by successfully loading the struct_ops programs calling these tcp-cc kfuncs. This patch needs to enable the CONFIG_TCP_CONG_DCTCP and the CONFIG_TCP_CONG_BBR. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240322191433.4133280-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: add batched tp/raw_tp/fmodret tests	Andrii Nakryiko
	Utilize bpf_modify_return_test_tp() kfunc to have a fast way to trigger tp/raw_tp/fmodret programs from another BPF program, which gives us comparable batched benchmarks to (batched) kprobe/fentry benchmarks. We don't switch kprobe/fentry batched benchmarks to this kfunc to make bench tool usable on older kernels as well. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: lazy-load trigger bench BPF programs	Andrii Nakryiko
	Instead of front-loading all possible benchmarking BPF programs for trigger benchmarks, explicitly specify which BPF programs are used by specific benchmark and load only it. This allows to be more flexible in supporting older kernels, where some program types might not be possible to load (e.g., those that rely on newly added kfunc). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: remove syscall-driven benchs, keep syscall-count only	Andrii Nakryiko
	Remove "legacy" benchmarks triggered by syscalls in favor of newly added in-kernel/batched benchmarks. Drop -batched suffix now as well. Next patch will restore "feature parity" by adding back tp/raw_tp/fmodret benchmarks based on in-kernel kfunc approach. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: add batched, mostly in-kernel BPF triggering benchmarks	Andrii Nakryiko
	Existing kprobe/fentry triggering benchmarks have 1-to-1 mapping between one syscall execution and BPF program run. While we use a fast get_pgid() syscall, syscall overhead can still be non-trivial. This patch adds kprobe/fentry set of benchmarks significantly amortizing the cost of syscall vs actual BPF triggering overhead. We do this by employing BPF_PROG_TEST_RUN command to trigger "driver" raw_tp program which does a tight parameterized loop calling cheap BPF helper (bpf_get_numa_node_id()), to which kprobe/fentry programs are attached for benchmarking. This way 1 bpf() syscall causes N executions of BPF program being benchmarked. N defaults to 100, but can be adjusted with --trig-batch-iters CLI argument. For comparison we also implement a new baseline program that instead of triggering another BPF program just does N atomic per-CPU counter increments, establishing the limit for all other types of program within this batched benchmarking setup. Taking the final set of benchmarks added in this patch set (including tp/raw_tp/fmodret, added in later patch), and keeping for now "legacy" syscall-driven benchmarks, we can capture all triggering benchmarks in one place for comparison, before we remove the legacy ones (and rename xxx-batched into just xxx). $ benchs/run_bench_trigger.sh usermode-count : 79.500 ± 0.024M/s kernel-count : 49.949 ± 0.081M/s syscall-count : 9.009 ± 0.007M/s fentry-batch : 31.002 ± 0.015M/s fexit-batch : 20.372 ± 0.028M/s fmodret-batch : 21.651 ± 0.659M/s rawtp-batch : 36.775 ± 0.264M/s tp-batch : 19.411 ± 0.248M/s kprobe-batch : 12.949 ± 0.220M/s kprobe-multi-batch : 15.400 ± 0.007M/s kretprobe-batch : 5.559 ± 0.011M/s kretprobe-multi-batch: 5.861 ± 0.003M/s fentry-legacy : 8.329 ± 0.004M/s fexit-legacy : 6.239 ± 0.003M/s fmodret-legacy : 6.595 ± 0.001M/s rawtp-legacy : 8.305 ± 0.004M/s tp-legacy : 6.382 ± 0.001M/s kprobe-legacy : 5.528 ± 0.003M/s kprobe-multi-legacy : 5.864 ± 0.022M/s kretprobe-legacy : 3.081 ± 0.001M/s kretprobe-multi-legacy: 3.193 ± 0.001M/s Note how xxx-batch variants are measured with significantly higher throughput, even though it's exactly the same in-kernel overhead. As such, results can be compared only between benchmarks of the same kind (syscall vs batched): fentry-legacy : 8.329 ± 0.004M/s fentry-batch : 31.002 ± 0.015M/s kprobe-multi-legacy : 5.864 ± 0.022M/s kprobe-multi-batch : 15.400 ± 0.007M/s Note also that syscall-count is setting a theoretical limit for syscall-triggered benchmarks, while kernel-count is setting similar limits for batch variants. usermode-count is a happy and unachievable case of user space counting without doing any syscalls, and is mostly the measure of CPU speed for such a trivial benchmark. As was mentioned, tp/raw_tp/fmodret require kernel-side kfunc to produce similar benchmark, which we address in a separate patch. Note that run_bench_trigger.sh allows to override a list of benchmarks to run, which is very useful for performance work. Cc: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: rename and clean up userspace-triggered benchmarks	Andrii Nakryiko
	Rename uprobe-base to more precise usermode-count (it will match other baseline-like benchmarks, kernel-count and syscall-count). Also use BENCH_TRIG_USERMODE() macro to define all usermode-based triggering benchmarks, which include usermode-count and uprobe/uretprobe benchmarks. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	bpf: improve error message for unsupported helper	Mykyta Yatsenko
	BPF verifier emits "unknown func" message when given BPF program type does not support BPF helper. This message may be confusing for users, as important context that helper is unknown only to current program type is not provided. This patch changes message to "program of this type cannot use helper " and aligns dependent code in libbpf and tests. Any suggestions on improving/changing this message are welcome. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/r/20240325152210.377548-1-yatsenko@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Add BPF_FIB_LOOKUP_MARK tests	Anton Protopopov
	This patch extends the fib_lookup test suite by adding a few test cases for each IP family to test the new BPF_FIB_LOOKUP_MARK flag to the bpf_fib_lookup: * Test destination IP address selection with and without a mark and/or the BPF_FIB_LOOKUP_MARK flag set Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240326101742.17421-3-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	bpf: Add support for passing mark with bpf_fib_lookup	Anton Protopopov
	Extend the bpf_fib_lookup() helper by making it to utilize mark if the BPF_FIB_LOOKUP_MARK flag is set. In order to pass the mark the four bytes of struct bpf_fib_lookup are used, shared with the output-only smac/dmac fields. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240326101742.17421-2-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	ynl: support hex display_hint for integer	Hangbin Liu
	Some times it would be convenient to read the integer as hex, like mask values. Suggested-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240327123130.1322921-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: Add a test for testing lib.sh functionality	Petr Machata
	Rerunning various scenarios to make sure lib.sh changes do not impact the observable behavior is no fun. Add a selftest at least for the bare basics -- the mechanics of setting RET, retmsg, and EXIT_STATUS. Since the selftest itself uses lib.sh, it would be possible to break lib.sh in such a way that invalidates result of the selftest. Since the metatest only uses the bare basics (just pass/fail), hopefully such fundamental breakages would be noticed. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/6d25cedbf2d4b83614944809a34fe023fbe8db38.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: router_mpath_nh_lib: Don't skip, xfail on veth	Petr Machata
	When the NH group stats tests are currently run on a veth topology, the HW-stats leg of each test is SKIP'ped. But kernel networking CI interprets skips as a sign that tooling is missing, and prompts maintainer investigation. Lack of capability to pass a test should be expressed as XFAIL. Selftests that require HW should normally be put in drivers/net/hw, but doing so for the NH counter selftests would just lead to a lot of duplicity. So instead, introduce a helper, xfail_on_veth(), which can be used to mark selftests that should XFAIL instead of FAILing when run on a veth topology. On non-veth topology, they don't do anything. Use the helper in the HW-stats part of router_mpath_nh_lib selftest. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/15f0ab9637aa0497f164ec30e83c1c8f53d53719.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: Mark performance-sensitive tests	Petr Machata
	When run on a slow machine, the scheduler traffic tests can be expected to fail, and should be reported as XFAIL in that case. Therefore run these tests through the perf_sensitive wrapper. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/9a357f8cf34f5ececac08d43a3eb023008996035.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: Support for performance sensitive tests	Petr Machata
	Several tests in the suite use large amounts of traffic to e.g. cause congestion and evaluate RED or shaper performance. These tests will not run well on a slow machine, be it one with heavy debug kernel, or a VM, or e.g. a single-board computer. Allow users to specify an environment variable, KSFT_MACHINE_SLOW=yes, to indicate that the tests are being run on one such machine. Performance sensitive tests can then use a new helper, xfail_on_slow(), to mark parts of the test that are sensitive to low-performance machines. The helper can be used to just mark the whole suite, like so: xfail_on_slow tests_run ... or, on the other side of the granularity spectrum, to override individual checks: xfail_on_slow check_err $? "Expected much, got little." Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/99a376a2d2ffdaeee7752b1910cb0c3ea5d80fbe.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>