linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2023-10-11	selftests/bpf: Make sure mount directory exists	Daan De Meyer
	The mount directory for the selftests cgroup tree might not exist so let's make sure it does exist by creating it ourselves if it doesn't exist. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-9-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	bpftool: Add support for cgroup unix socket address hooks	Daan De Meyer
	Add the necessary plumbing to hook up the new cgroup unix sockaddr hooks into bpftool. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20231011185113.140426-7-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	libbpf: Add support for cgroup unix socket address hooks	Daan De Meyer
	Add the necessary plumbing to hook up the new cgroup unix sockaddr hooks into libbpf. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-6-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	bpf: Implement cgroup sockaddr hooks for unix sockets	Daan De Meyer
	These hooks allows intercepting connect(), getsockname(), getpeername(), sendmsg() and recvmsg() for unix sockets. The unix socket hooks get write access to the address length because the address length is not fixed when dealing with unix sockets and needs to be modified when a unix socket address is modified by the hook. Because abstract socket unix addresses start with a NUL byte, we cannot recalculate the socket address in kernelspace after running the hook by calculating the length of the unix socket path using strlen(). These hooks can be used when users want to multiplex syscall to a single unix socket to multiple different processes behind the scenes by redirecting the connect() and other syscalls to process specific sockets. We do not implement support for intercepting bind() because when using bind() with unix sockets with a pathname address, this creates an inode in the filesystem which must be cleaned up. If we rewrite the address, the user might try to clean up the wrong file, leaking the socket in the filesystem where it is never cleaned up. Until we figure out a solution for this (and a use case for intercepting bind()), we opt to not allow rewriting the sockaddr in bind() calls. We also implement recvmsg() support for connected streams so that after a connect() that is modified by a sockaddr hook, any corresponding recmvsg() on the connected socket can also be modified to make the connected program think it is connected to the "intended" remote. Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	tools: ynl: use ynl-gen -o instead of stdout in Makefile	Jakub Kicinski
	Jiri added more careful handling of output of the code generator to avoid wiping out existing files in commit f65f305ae008 ("tools: ynl-gen: use temporary file for rendering") Make use of the -o option in the Makefiles, it is already used by ynl-regen.sh. Link: https://lore.kernel.org/r/20231010202714.4045168-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-11	selftests/bpf: Add missing section name tests for getpeername/getsockname	Daan De Meyer
	These were missed when these hooks were first added so add them now instead to make sure every sockaddr hook has a matching section name test. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-2-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-10	Merge tag 'hyperv-fixes-signed-20231009' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - fixes for Hyper-V VTL code (Saurabh Sengar and Olaf Hering) - fix hv_kvp_daemon to support keyfile based connection profile (Shradha Gupta) * tag 'hyperv-fixes-signed-20231009' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hv/hv_kvp_daemon:Support for keyfile based connection profile hyperv: reduce size of ms_hyperv_info x86/hyperv: Add common print prefix "Hyper-V" in hv_init x86/hyperv: Remove hv_vtl_early_init initcall x86/hyperv: Restrict get_vtl to only VTL platforms
2023-10-10	hv/hv_kvp_daemon:Support for keyfile based connection profile	Shradha Gupta
	Ifcfg config file support in NetworkManger is deprecated. This patch provides support for the new keyfile config format for connection profiles in NetworkManager. The patch modifies the hv_kvp_daemon code to generate the new network configuration in keyfile format(.ini-style format) along with a ifcfg format configuration. The ifcfg format configuration is also retained to support easy backward compatibility for distro vendors. These configurations are stored in temp files which are further translated using the hv_set_ifconfig.sh script. This script is implemented by individual distros based on the network management commands supported. For example, RHEL's implementation could be found here: https://gitlab.com/redhat/centos-stream/src/hyperv-daemons/-/blob/c9s/hv_set_ifconfig.sh Debian's implementation could be found here: https://github.com/endlessm/linux/blob/master/debian/cloud-tools/hv_set_ifconfig The next part of this support is to let the Distro vendors consume these modified implementations to the new configuration format. Tested-on: Rhel9(Hyper-V, Azure)(nm and ifcfg files verified) Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/1696847920-31125-1-git-send-email-shradhagupta@linux.microsoft.com
2023-10-09	tools: ynl-gen: handle do ops with no input attrs	Jakub Kicinski
	The code supports dumps with no input attributes currently thru a combination of special-casing and luck. Clean up the handling of ops with no inputs. Create empty Structs, and skip printing of empty types. This makes dos with no inputs work. Tested-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Link: https://lore.kernel.org/r/20231006135032.3328523-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-09	selftests/bpf: Add BPF_FIB_LOOKUP_SRC tests	Martynas Pumputis
	This patch extends the existing fib_lookup test suite by adding two test cases (for each IP family): * Test source IP selection from the egressing netdev. * Test source IP selection when an IP route has a preferred src IP addr. Signed-off-by: Martynas Pumputis <m@lambda.lt> Link: https://lore.kernel.org/r/20231007081415.33502-3-m@lambda.lt Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-09	bpf: Derive source IP addr via bpf_*_fib_lookup()	Martynas Pumputis
	Extend the bpf_fib_lookup() helper by making it to return the source IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set. For example, the following snippet can be used to derive the desired source IP address: struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr }; ret = bpf_skb_fib_lookup(skb, p, sizeof(p), BPF_FIB_LOOKUP_SRC \| BPF_FIB_LOOKUP_SKIP_NEIGH); if (ret != BPF_FIB_LKUP_RET_SUCCESS) return TC_ACT_SHOT; /* the p.ipv4_src now contains the source address */ The inability to derive the proper source address may cause malfunctions in BPF-based dataplanes for hosts containing netdevs with more than one routable IP address or for multi-homed hosts. For example, Cilium implements packet masquerading in BPF. If an egressing netdev to which the Cilium's BPF prog is attached has multiple IP addresses, then only one [hardcoded] IP address can be used for masquerading. This breaks connectivity if any other IP address should have been selected instead, for example, when a public and private addresses are attached to the same egress interface. The change was tested with Cilium [1]. Nikolay Aleksandrov helped to figure out the IPv6 addr selection. [1]: https://github.com/cilium/cilium/pull/28283 Signed-off-by: Martynas Pumputis <m@lambda.lt> Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-09	selftests/bpf: Add testcase for async callback return value failure	David Vernet
	A previous commit updated the verifier to print an accurate failure message for when someone specifies a nonzero return value from an async callback. This adds a testcase for validating that the verifier emits the correct message in such a case. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231009161414.235829-2-void@manifault.com
2023-10-09	bpftool: Align bpf_load_and_run_opts insns and data	Ian Rogers
	A C string lacks alignment so use aligned arrays to avoid potential alignment problems. Switch to using sizeof (less 1 for the \0 terminator) rather than a hardcode size constant. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20231007044439.25171-2-irogers@google.com
2023-10-09	bpftool: Align output skeleton ELF code	Ian Rogers
	libbpf accesses the ELF data requiring at least 8 byte alignment, however, the data is generated into a C string that doesn't guarantee alignment. Fix this by assigning to an aligned char array. Use sizeof on the array, less one for the \0 terminator, rather than generating a constant. Fixes: a6cc6b34b93e ("bpftool: Provide a helper method for accessing skeleton's embedded ELF data") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20231007044439.25171-1-irogers@google.com
2023-10-09	selftests/bpf: Test pinning bpf timer to a core	David Vernet
	Now that we support pinning a BPF timer to the current core, we should test it with some selftests. This patch adds two new testcases to the timer suite, which verifies that a BPF timer both with and without BPF_F_TIMER_ABS, can be pinned to the calling core with BPF_F_TIMER_CPU_PIN. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20231004162339.200702-3-void@manifault.com
2023-10-09	bpf: Add ability to pin bpf timer to calling CPU	David Vernet
	BPF supports creating high resolution timers using bpf_timer_* helper functions. Currently, only the BPF_F_TIMER_ABS flag is supported, which specifies that the timeout should be interpreted as absolute time. It would also be useful to be able to pin that timer to a core. For example, if you wanted to make a subset of cores run without timer interrupts, and only have the timer be invoked on a single core. This patch adds support for this with a new BPF_F_TIMER_CPU_PIN flag. When specified, the HRTIMER_MODE_PINNED flag is passed to hrtimer_start(). A subsequent patch will update selftests to validate. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20231004162339.200702-2-void@manifault.com
2023-10-06	selftests/bpf: Make seen_tc* variable tests more robust	Daniel Borkmann
	Martin reported that on his local dev machine the test_tc_chain_mixed() fails as "test_tc_chain_mixed:FAIL:seen_tc5 unexpected seen_tc5: actual 1 != expected 0" and others occasionally, too. However, when running in a more isolated setup (qemu in particular), it works fine for him. The reason is that there is a small race-window where seen_tc* could turn into true for various test cases when there is background traffic, e.g. after the asserts they often get reset. In such case when subsequent detach takes place, unrelated background traffic could have already flipped the bool to true beforehand. Add a small helper tc_skel_reset_all_seen() to reset all bools before we do the ping test. At this point, everything is set up as expected and therefore no race can occur. All tc_{opts,links} tests continue to pass after this change. Reported-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-7-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-06	selftests/bpf: Test query on empty mprog and pass revision into attach	Daniel Borkmann
	Add a new test case to query on an empty bpf_mprog and pass the revision directly into expected_revision for attachment to assert that this does succeed. ./test_progs -t tc_opts [ 1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz [ 1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns [ 1.412419] clocksource: Switched to clocksource tsc [ 1.428671] bpf_testmod: loading out-of-tree module taints kernel. [ 1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK #269 tc_opts_query_attach:OK <--- (new test) #270 tc_opts_replace:OK #271 tc_opts_revision:OK Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-6-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-06	selftests/bpf: Adapt assert_mprog_count to always expect 0 count	Daniel Borkmann
	Simplify __assert_mprog_count() to remove the -ENOENT corner case as the bpf_prog_query() now returns 0 when no bpf_mprog is attached. This also allows to convert a few test cases from using raw __assert_mprog_count() over to plain assert_mprog_count() helper. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-5-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-06	selftests/bpf: Test bpf_mprog query API via libbpf and raw syscall	Daniel Borkmann
	Add a new test case which performs double query of the bpf_mprog through libbpf API, but also via raw bpf(2) syscall. This is testing to gather first the count and then in a subsequent probe the full information with the program array without clearing passed structs in between. # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.398818] tsc: Refined TSC clocksource calibration: 3407.999 MHz [ 1.400263] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd336761, max_idle_ns: 440795243819 ns [ 1.402734] clocksource: Switched to clocksource tsc [ 1.426639] bpf_testmod: loading out-of-tree module taints kernel. [ 1.428112] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK <--- (new test) #269 tc_opts_replace:OK #270 tc_opts_revision:OK Summary: 19/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-4-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-06	selftests/bpf: Add pairs_redir_to_connected helper	Geliang Tang
	Extract duplicate code from these four functions unix_redir_to_connected() udp_redir_to_connected() inet_unix_redir_to_connected() unix_inet_redir_to_connected() to generate a new helper pairs_redir_to_connected(). Create the different socketpairs in these four functions, then pass the socketpairs info to the new common helper to do the connections. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/54bb28dcf764e7d4227ab160883931d2173f4f3d.1696588133.git.geliang.tang@suse.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-06	selftests/bpf: Don't truncate #test/subtest field	Andrii Nakryiko
	We currently expect up to a three-digit number of tests and subtests, so: #999/999: some_test/some_subtest: ... Is the largest test/subtest we can see. If we happen to cross into 1000s, current logic will just truncate everything after 7th character. This patch fixes this truncate and allows to go way higher (up to 31 characters in total). We still nicely align test numbers: #60/66 core_reloc_btfgen/type_based___incompat:OK #60/67 core_reloc_btfgen/type_based___fn_wrong_args:OK #60/68 core_reloc_btfgen/type_id:OK #60/69 core_reloc_btfgen/type_id___missing_targets:OK #60/70 core_reloc_btfgen/enumval:OK Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20231006175744.3136675-3-andrii@kernel.org
2023-10-06	selftests/bpf: Support building selftests in optimized -O2 mode	Andrii Nakryiko
	Add support for building selftests with -O2 level of optimization, which allows more compiler warnings detection (like lots of potentially uninitialized usage), but also is useful to have a faster-running test for some CPU-intensive tests. One can build optimized versions of libbpf and selftests by running: $ make RELEASE=1 There is a measurable speed up of about 10 seconds for me locally, though it's mostly capped by non-parallelized serial tests. User CPU time goes down by total 40 seconds, from 1m10s to 0m28s. Unoptimized build (-O0) ======================= Summary: 430/3544 PASSED, 25 SKIPPED, 4 FAILED real 1m59.937s user 1m10.877s sys 3m14.880s Optimized build (-O2) ===================== Summary: 425/3543 PASSED, 25 SKIPPED, 9 FAILED real 1m50.540s user 0m28.406s sys 3m13.198s Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20231006175744.3136675-2-andrii@kernel.org
2023-10-06	selftests/bpf: Fix compiler warnings reported in -O2 mode	Andrii Nakryiko
	Fix a bunch of potentially unitialized variable usage warnings that are reported by GCC in -O2 mode. Also silence overzealous stringop-truncation class of warnings. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20231006175744.3136675-1-andrii@kernel.org
2023-10-06	tools: ynl-gen: use uapi header name for the header guard	Jakub Kicinski
	Chuck points out that we should use the uapi-header property when generating the guard. Otherwise we may generate the same guard as another file in the tree. Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-05	selftests/bpf: Enable CONFIG_VSOCKETS in config	Geliang Tang
	CONFIG_VSOCKETS is required by BPF selftests, otherwise we get errors like this: ./test_progs:socket_loopback_reuseport:386: socket: Address family not supported by protocol socket_loopback_reuseport:FAIL:386 ./test_progs:vsock_unix_redir_connectible:1496: vsock_socketpair_connectible() failed vsock_unix_redir_connectible:FAIL:1496 So this patch enables it in tools/testing/selftests/bpf/config. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/472e73d285db2ea59aca9bbb95eb5d4048327588.1696490003.git.geliang.tang@suse.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-05	KVM: selftests: Zero-initialize entire test_result in memslot perf test	Sean Christopherson
	Zero-initialize the entire test_result structure used by memslot_perf_test instead of zeroing only the fields used to guard the pr_info() calls. gcc 13.2.0 is a bit overzealous and incorrectly thinks that rbestslottime's slot_runtime may be used uninitialized. In file included from memslot_perf_test.c:25: memslot_perf_test.c: In function ‘main’: include/test_util.h:31:22: error: ‘rbestslottime.slot_runtime.tv_nsec’ may be used uninitialized [-Werror=maybe-uninitialized] 31 \| #define pr_info(...) printf(__VA_ARGS__) \| ^~~~~~~~~~~~~~~~~~~ memslot_perf_test.c:1127:17: note: in expansion of macro ‘pr_info’ 1127 \| pr_info("Best slot setup time for the whole test area was %ld.%.9lds\n", \| ^~~~~~~ memslot_perf_test.c:1092:28: note: ‘rbestslottime.slot_runtime.tv_nsec’ was declared here 1092 \| struct test_result rbestslottime; \| ^~~~~~~~~~~~~ include/test_util.h:31:22: error: ‘rbestslottime.slot_runtime.tv_sec’ may be used uninitialized [-Werror=maybe-uninitialized] 31 \| #define pr_info(...) printf(__VA_ARGS__) \| ^~~~~~~~~~~~~~~~~~~ memslot_perf_test.c:1127:17: note: in expansion of macro ‘pr_info’ 1127 \| pr_info("Best slot setup time for the whole test area was %ld.%.9lds\n", \| ^~~~~~~ memslot_perf_test.c:1092:28: note: ‘rbestslottime.slot_runtime.tv_sec’ was declared here 1092 \| struct test_result rbestslottime; \| ^~~~~~~~~~~~~ That can't actually happen, at least not without the "result" structure in test_loop() also being used uninitialized, which gcc doesn't complain about, as writes to rbestslottime are all-or-nothing, i.e. slottimens can't be non-zero without slot_runtime being written. if (!data->mem_size && (!rbestslottime->slottimens \|\| result.slottimens < rbestslottime->slottimens)) *rbestslottime = result; Zero-initialize the structures to make gcc happy even though this is likely a compiler bug. The cost to do so is negligible, both in terms of code and runtime overhead. The only downside is that the compiler won't warn about legitimate usage of "uninitialized" data, e.g. the test could end up consuming zeros instead of useful data. However, given that the test is quite mature and unlikely to see substantial changes, the odds of introducing such bugs are relatively low, whereas being able to compile KVM selftests with -Werror detects issues on a regular basis. Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20231005002954.2887098-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. No conflicts (or adjacent changes of note). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-05	Merge tag 'net-6.6-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Bluetooth, netfilter, BPF and WiFi. I didn't collect precise data but feels like we've got a lot of 6.5 fixes here. WiFi fixes are most user-awaited. Current release - regressions: - Bluetooth: fix hci_link_tx_to RCU lock usage Current release - new code bugs: - bpf: mprog: fix maximum program check on mprog attachment - eth: ti: icssg-prueth: fix signedness bug in prueth_init_tx_chns() Previous releases - regressions: - ipv6: tcp: add a missing nf_reset_ct() in 3WHS handling - vringh: don't use vringh_kiov_advance() in vringh_iov_xfer(), it doesn't handle zero length like we expected - wifi: - cfg80211: fix cqm_config access race, fix crashes with brcmfmac - iwlwifi: mvm: handle PS changes in vif_cfg_changed - mac80211: fix mesh id corruption on 32 bit systems - mt76: mt76x02: fix MT76x0 external LNA gain handling - Bluetooth: fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER - l2tp: fix handling of transhdrlen in __ip{,6}_append_data() - dsa: mv88e6xxx: avoid EEPROM timeout when EEPROM is absent - eth: stmmac: fix the incorrect parameter after refactoring Previous releases - always broken: - net: replace calls to sock->ops->connect() with kernel_connect(), prevent address rewrite in kernel_bind(); otherwise BPF hooks may modify arguments, unexpectedly to the caller - tcp: fix delayed ACKs when reads and writes align with MSS - bpf: - verifier: unconditionally reset backtrack_state masks on global func exit - s390: let arch_prepare_bpf_trampoline return program size, fix struct_ops offsets - sockmap: fix accounting of available bytes in presence of PEEKs - sockmap: reject sk_msg egress redirects to non-TCP sockets - ipv4/fib: send netlink notify when delete source address routes - ethtool: plca: fix width of reads when parsing netlink commands - netfilter: nft_payload: rebuild vlan header on h_proto access - Bluetooth: hci_codec: fix leaking memory of local_codecs - eth: intel: ice: always add legacy 32byte RXDID in supported_rxdids - eth: stmmac: - dwmac-stm32: fix resume on STM32 MCU - remove buggy and unneeded stmmac_poll_controller, depend on NAPI - ibmveth: always recompute TCP pseudo-header checksum, fix use of the driver with Open vSwitch - wifi: - rtw88: rtw8723d: fix MAC address offset in EEPROM - mt76: fix lock dependency problem for wed_lock - mwifiex: sanity check data reported by the device - iwlwifi: ensure ack flag is properly cleared - iwlwifi: mvm: fix a memory corruption due to bad pointer arithm - iwlwifi: mvm: fix incorrect usage of scan API Misc: - wifi: mac80211: work around Cisco AP 9115 VHT MPDU length" * tag 'net-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) MAINTAINERS: update Matthieu's email address mptcp: userspace pm allow creating id 0 subflow mptcp: fix delegated action races net: stmmac: remove unneeded stmmac_poll_controller net: lan743x: also select PHYLIB net: ethernet: mediatek: disable irq before schedule napi net: mana: Fix oversized sge0 for GSO packets net: mana: Fix the tso_bytes calculation net: mana: Fix TX CQE error handling netlink: annotate data-races around sk->sk_err sctp: update hb timer immediately after users change hb_interval sctp: update transport state when processing a dupcook packet tcp: fix delayed ACKs for MSS boundary condition tcp: fix quick-ack counting to count actual ACKs of new data page_pool: fix documentation typos tipc: fix a potential deadlock on &tx->lock net: stmmac: dwmac-stm32: fix resume on STM32 MCU ipv4: Set offload_failed flag in fibmatch results netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure netfilter: nf_tables: Deduplicate nft_register_obj audit logs ...
2023-10-04	Merge tag 'rtla-v6.6-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux Pull rtla fixes from Daniel Bristot de Oliveira: "rtla (Real-Time Linux Analysis) tool fixes. Timerlat auto-analysis: - Timerlat is reporting thread interference time without thread noise events occurrence. It was caused because the thread interference variable was not reset after the analysis of a timerlat activation that did not hit the threshold. - The IRQ handler delay is estimated from the delta of the IRQ latency reported by timerlat, and the timestamp from IRQ handler start event. If the delta is near-zero, the drift from the external clock and the trace event and/or the overhead can cause the value to be negative. If the value is negative, print a zero-delay. - IRQ handlers happening after the timerlat thread event but before the stop tracing were being reported as IRQ that happened before the current IRQ occurrence. Ignore Previous IRQ noise in this condition because they are valid only for the next timerlat activation. Timerlat user-space: - Timerlat is stopping all user-space thread if a CPU becomes offline. Do not stop the entire tool if a CPU is/become offline, but only the thread of the unavailable CPU. Stop the tool only, if all threads leave because the CPUs become/are offline. man-pages: - Fix command line example in timerlat hist man page" * tag 'rtla-v6.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux: rtla: fix a example in rtla-timerlat-hist.rst rtla/timerlat: Do not stop user-space if a cpu is offline rtla/timerlat_aa: Fix previous IRQ delay for IRQs that happens after thread sample rtla/timerlat_aa: Fix negative IRQ delay rtla/timerlat_aa: Zero thread sum after every sample analysis
2023-10-04	tools: ynl: use uAPI include magic for samples	Jakub Kicinski
	Makefile.deps provides direct includes in CFLAGS_$(obj). We just need to rewrite the rules to make use of the extra flags, no need to hard-include all of tools/include/uapi. Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231003153416.2479808-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04	tools: ynl: don't regen on every make	Jakub Kicinski
	As far as I can tell the normal Makefile dependency tracking works, generated files get re-generated if the YAML was updated. Let make do its job, don't force the re-generation. make hardclean can be used to force regeneration. Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231003153416.2479808-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04	Merge tag 'nf-23-10-04' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter patches for net First patch resolves a regression with vlan header matching, this was broken since 6.5 release. From myself. Second patch fixes an ancient problem with sctp connection tracking in case INIT_ACK packets are delayed. This comes with a selftest, both patches from Xin Long. Patch 4 extends the existing nftables audit selftest, from Phil Sutter. Patch 5, also from Phil, avoids a situation where nftables would emit an audit record twice. This was broken since 5.13 days. Patch 6, from myself, avoids spurious insertion failure if we encounter an overlapping but expired range during element insertion with the 'nft_set_rbtree' backend. This problem exists since 6.2. * tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure netfilter: nf_tables: Deduplicate nft_register_obj audit logs selftests: netfilter: Extend nft_audit.sh selftests: netfilter: test for sctp collision processing in nf_conntrack netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp netfilter: nft_payload: rebuild vlan header on h_proto access ==================== Link: https://lore.kernel.org/r/20231004141405.28749-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04	Merge tag 'nf-next-23-09-28' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for net-next First patch, from myself, is a bug fix. The issue (connect timeout) is ancient, so I think its safe to give this more soak time given the esoteric conditions needed to trigger this. Also updates the existing selftest to cover this. Add netlink extacks when an update references a non-existent table/chain/set. This allows userspace to provide much better errors to the user, from Pablo Neira Ayuso. Last patch adds more policy checks to nf_tables as a better alternative to the existing runtime checks, from Phil Sutter. * tag 'nf-next-23-09-28' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: Utilize NLA_POLICY_NESTED_ARRAY netfilter: nf_tables: missing extended netlink error in lookup functions selftests: netfilter: test nat source port clash resolution interaction with tcp early demux netfilter: nf_nat: undo erroneous tcp edemux lookup after port clash ==================== Link: https://lore.kernel.org/r/20230928144916.18339-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04	selftests/bpf: Add uprobe_multi to gen_tar target	Björn Töpel
	The uprobe_multi program was not picked up for the gen_tar target. Fix by adding it to TEST_GEN_FILES. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20231004122721.54525-4-bjorn@kernel.org
2023-10-04	selftests/bpf: Enable lld usage for RISC-V	Björn Töpel
	RISC-V has proper lld support. Use that, similar to what x86 does, for urandom_read et al. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231004122721.54525-3-bjorn@kernel.org
2023-10-04	selftests/bpf: Add cross-build support for urandom_read et al	Björn Töpel
	Some userland programs in the BPF test suite, e.g. urandom_read, is missing cross-build support. Add cross-build support for these programs Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231004122721.54525-2-bjorn@kernel.org
2023-10-04	selftests/bpf: Define SYS_NANOSLEEP_KPROBE_NAME for riscv	Björn Töpel
	Add missing sys_nanosleep name for RISC-V, which is used by some tests (e.g. attach_probe). Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/bpf/20231004110905.49024-4-bjorn@kernel.org
2023-10-04	selftests/bpf: Define SYS_PREFIX for riscv	Björn Töpel
	SYS_PREFIX was missing for a RISC-V, which made a couple of kprobe tests fail. Add missing SYS_PREFIX for RISC-V. Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/bpf/20231004110905.49024-3-bjorn@kernel.org
2023-10-04	libbpf: Fix syscall access arguments on riscv	Alexandre Ghiti
	Since commit 08d0ce30e0e4 ("riscv: Implement syscall wrappers"), riscv selects ARCH_HAS_SYSCALL_WRAPPER so let's use the generic implementation of PT_REGS_SYSCALL_REGS(). Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/bpf/20231004110905.49024-2-bjorn@kernel.org
2023-10-04	KVM: selftests: Remove obsolete and incorrect test case metadata	Like Xu
	Delete inaccurate descriptions and obsolete metadata for test cases. It adds zero value, and has a non-zero chance of becoming stale and misleading in the future. No functional changes intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230914094803.94661-1-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-04	KVM: selftests: Treat %llx like %lx when formatting guest printf	Sean Christopherson
	Treat %ll* formats the same as %l* formats when processing printfs from the guest so that using e.g. %llx instead of %lx generates the expected output. Ideally, unexpected formats would generate compile-time warnings or errors, but it's not at all obvious how to actually accomplish that. Alternatively, guest_vsnprintf() could assert on an unexpected format, but since the vast majority of printfs are for failed guest asserts, getting something printed is better than nothing. E.g. before ==== Test Assertion Failure ==== x86_64/private_mem_conversions_test.c:265: mem[i] == 0 pid=4286 tid=4290 errno=4 - Interrupted system call 1 0x0000000000401c74: __test_mem_conversions at private_mem_conversions_test.c:336 2 0x00007f3aae6076da: ?? ??:0 3 0x00007f3aae32161e: ?? ??:0 Expected 0x0 at offset 0 (gpa 0x%lx), got 0x0 and after ==== Test Assertion Failure ==== x86_64/private_mem_conversions_test.c:265: mem[i] == 0 pid=5664 tid=5668 errno=4 - Interrupted system call 1 0x0000000000401c74: __test_mem_conversions at private_mem_conversions_test.c:336 2 0x00007fbe180076da: ?? ??:0 3 0x00007fbe17d2161e: ?? ??:0 Expected 0x0 at offset 0 (gpa 0x100000000), got 0xcc Fixes: e5119382499c ("KVM: selftests: Add guest_snprintf() to KVM selftests") Cc: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20230921171641.3641776-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-04	Merge tag 'linux-kselftest-fixes-6.6-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fix from Shuah Khan: "One single fix to Makefile to fix the incorrect TARGET name for uevent test" * tag 'linux-kselftest-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: Fix wrong TARGET in kselftest top level Makefile
2023-10-04	Merge tag 'for-netdev' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-10-02 We've added 11 non-merge commits during the last 12 day(s) which contain a total of 12 files changed, 176 insertions(+), 41 deletions(-). The main changes are: 1) Fix BPF verifier to reset backtrack_state masks on global function exit as otherwise subsequent precision tracking would reuse them, from Andrii Nakryiko. 2) Several sockmap fixes for available bytes accounting, from John Fastabend. 3) Reject sk_msg egress redirects to non-TCP sockets given this is only supported for TCP sockets today, from Jakub Sitnicki. 4) Fix a syzkaller splat in bpf_mprog when hitting maximum program limits with BPF_F_BEFORE directive, from Daniel Borkmann and Nikolay Aleksandrov. 5) Fix BPF memory allocator to use kmalloc_size_roundup() to adjust size_index for selecting a bpf_mem_cache, from Hou Tao. 6) Fix arch_prepare_bpf_trampoline return code for s390 JIT, from Song Liu. 7) Fix bpf_trampoline_get when CONFIG_BPF_JIT is turned off, from Leon Hwang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Use kmalloc_size_roundup() to adjust size_index selftest/bpf: Add various selftests for program limits bpf, mprog: Fix maximum program check on mprog attachment bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets bpf, sockmap: Add tests for MSG_F_PEEK bpf, sockmap: Do not inc copied_seq when PEEK flag set bpf: tcp_read_skb needs to pop skb regardless of seq bpf: unconditionally reset backtrack_state masks on global func exit bpf: Fix tr dereferencing selftests/bpf: Check bpf_cubic_acked() is called via struct_ops s390/bpf: Let arch_prepare_bpf_trampoline return program size ==================== Link: https://lore.kernel.org/r/20231002113417.2309-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04	netfilter: nf_tables: Deduplicate nft_register_obj audit logs	Phil Sutter
	When adding/updating an object, the transaction handler emits suitable audit log entries already, the one in nft_obj_notify() is redundant. To fix that (and retain the audit logging from objects' 'update' callback), Introduce an "audit log free" variant for internal use. Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> (Audit) Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-04	selftests/xsk: Add a test for shared umem feature	Tushar Vyavahare
	Add a new test for testing shared umem feature. This is accomplished by adding a new XDP program and using the multiple sockets. The new XDP program redirects the packets based on the destination MAC address. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-9-tushar.vyavahare@intel.com
2023-10-04	selftests/xsk: Modify xsk_update_xskmap() to accept the index as an argument	Tushar Vyavahare
	Modify xsk_update_xskmap() to accept the index as an argument, enabling the addition of multiple sockets to xskmap. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-8-tushar.vyavahare@intel.com
2023-10-04	selftests/xsk: Iterate over all the sockets in the send pkts function	Tushar Vyavahare
	Update send_pkts() to handle multiple sockets for sending packets. Multiple TX sockets are utilized alternately based on the batch size for improve packet transmission. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-7-tushar.vyavahare@intel.com
2023-10-04	selftests/xsk: Remove unnecessary parameter from pkt_set() function call	Tushar Vyavahare
	The pkt_set() function no longer needs the umem parameter. This commit removes the umem parameter from the pkt_set() function. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-6-tushar.vyavahare@intel.com
2023-10-04	selftests/xsk: Iterate over all the sockets in the receive pkts function	Tushar Vyavahare
	Improve the receive_pkt() function to enable it to receive packets from multiple sockets. Define a sock_num variable to iterate through all the sockets in the Rx path. Add nb_valid_entries to check that all the expected number of packets are received. Revise the function __receive_pkts() to only inspect the receive ring once, handle any received packets, and promptly return. Implement a bitmap to store the value of number of sockets. Update Makefile to include find_bit.c for compiling xskxceiver. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-5-tushar.vyavahare@intel.com