linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-08-14	selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT	Yonghong Song
	[ Upstream commit 7015843afcaf68c132784c89528dfddc0005e483 ] Alexei reported that send_signal test may fail with nested CONFIG_PARAVIRT configs. In this particular case, the base VM is AMD with 166 cpus, and I run selftests with regular qemu on top of that and indeed send_signal test failed. I also tried with an Intel box with 80 cpus and there is no issue. The main qemu command line includes: -enable-kvm -smp 16 -cpu host The failure log looks like: $ ./test_progs -t send_signal [ 48.501588] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [test_progs:2225] [ 48.503622] Modules linked in: bpf_testmod(O) [ 48.503622] CPU: 9 PID: 2225 Comm: test_progs Tainted: G O 6.9.0-08561-g2c1713a8f1c9-dirty #69 [ 48.507629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 48.511635] RIP: 0010:handle_softirqs+0x71/0x290 [ 48.511635] Code: [...] 10 0a 00 00 00 31 c0 65 66 89 05 d5 f4 fa 7e fb bb ff ff ff ff <49> c7 c2 cb [ 48.518527] RSP: 0018:ffffc90000310fa0 EFLAGS: 00000246 [ 48.519579] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0 [ 48.522526] RDX: 0000000000000006 RSI: ffff88810791ae80 RDI: 0000000000000000 [ 48.523587] RBP: ffffc90000fabc88 R08: 00000005a0af4f7f R09: 0000000000000000 [ 48.525525] R10: 0000000561d2f29c R11: 0000000000006534 R12: 0000000000000280 [ 48.528525] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 48.528525] FS: 00007f2f2885cd00(0000) GS:ffff888237c40000(0000) knlGS:0000000000000000 [ 48.531600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.535520] CR2: 00007f2f287059f0 CR3: 0000000106a28002 CR4: 00000000003706f0 [ 48.537538] Call Trace: [ 48.537538] <IRQ> [ 48.537538] ? watchdog_timer_fn+0x1cd/0x250 [ 48.539590] ? lockup_detector_update_enable+0x50/0x50 [ 48.539590] ? __hrtimer_run_queues+0xff/0x280 [ 48.542520] ? hrtimer_interrupt+0x103/0x230 [ 48.544524] ? __sysvec_apic_timer_interrupt+0x4f/0x140 [ 48.545522] ? sysvec_apic_timer_interrupt+0x3a/0x90 [ 48.547612] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.547612] ? handle_softirqs+0x71/0x290 [ 48.547612] irq_exit_rcu+0x63/0x80 [ 48.551585] sysvec_apic_timer_interrupt+0x75/0x90 [ 48.552521] </IRQ> [ 48.553529] <TASK> [ 48.553529] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.555609] RIP: 0010:finish_task_switch.isra.0+0x90/0x260 [ 48.556526] Code: [...] 9f 58 0a 00 00 48 85 db 0f 85 89 01 00 00 4c 89 ff e8 53 d9 bd 00 fb 66 90 <4d> 85 ed 74 [ 48.562524] RSP: 0018:ffffc90000fabd38 EFLAGS: 00000282 [ 48.563589] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83385620 [ 48.563589] RDX: ffff888237c73ae4 RSI: 0000000000000000 RDI: ffff888237c6fd00 [ 48.568521] RBP: ffffc90000fabd68 R08: 0000000000000000 R09: 0000000000000000 [ 48.569528] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881009d0000 [ 48.573525] R13: ffff8881024e5400 R14: ffff88810791ae80 R15: ffff888237c6fd00 [ 48.575614] ? finish_task_switch.isra.0+0x8d/0x260 [ 48.576523] __schedule+0x364/0xac0 [ 48.577535] schedule+0x2e/0x110 [ 48.578555] pipe_read+0x301/0x400 [ 48.579589] ? destroy_sched_domains_rcu+0x30/0x30 [ 48.579589] vfs_read+0x2b3/0x2f0 [ 48.579589] ksys_read+0x8b/0xc0 [ 48.583590] do_syscall_64+0x3d/0xc0 [ 48.583590] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 48.586525] RIP: 0033:0x7f2f28703fa1 [ 48.587592] Code: [...] 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d c5 23 14 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 [ 48.593534] RSP: 002b:00007ffd90f8cf88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 48.595589] RAX: ffffffffffffffda RBX: 00007ffd90f8d5e8 RCX: 00007f2f28703fa1 [ 48.595589] RDX: 0000000000000001 RSI: 00007ffd90f8cfb0 RDI: 0000000000000006 [ 48.599592] RBP: 00007ffd90f8d2f0 R08: 0000000000000064 R09: 0000000000000000 [ 48.602527] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 48.603589] R13: 00007ffd90f8d608 R14: 00007f2f288d8000 R15: 0000000000f6bdb0 [ 48.605527] </TASK> In the test, two processes are communicating through pipe. Further debugging with strace found that the above splat is triggered as read() syscall could not receive the data even if the corresponding write() syscall in another process successfully wrote data into the pipe. The failed subtest is "send_signal_perf". The corresponding perf event has sample_period 1 and config PERF_COUNT_SW_CPU_CLOCK. sample_period 1 means every overflow event will trigger a call to the BPF program. So I suspect this may overwhelm the system. So I increased the sample_period to 100,000 and the test passed. The sample_period 10,000 still has the test failed. In other parts of selftest, e.g., [1], sample_freq is used instead. So I decided to use sample_freq = 1,000 since the test can pass as well. [1] https://lore.kernel.org/bpf/20240604070700.3032142-1-song@kernel.org/ Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240605201203.2603846-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-11	selftests: mptcp: join: check backup support in signal endp	Matthieu Baerts (NGI0)
	commit f833470c27832136d4416d8fc55d658082af0989 upstream. Before the previous commit, 'signal' endpoints with the 'backup' flag were ignored when sending the MP_JOIN. The MPTCP Join selftest has then been modified to validate this case: the "single address, backup" test, is now validating the MP_JOIN with a backup flag as it is what we expect it to do with such name. The previous version has been kept, but renamed to "single address, switch to backup" to avoid confusions. The "single address with port, backup" test is also now validating the MPJ with a backup flag, which makes more sense than checking the switch to backup with an MP_PRIO. The "mpc backup both sides" test is now validating that the backup flag is also set in MP_JOIN from and to the addresses used in the initial subflow, using the special ID 0. The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-11	selftests: mptcp: join: validate backup in MPJ	Matthieu Baerts (NGI0)
	commit 935ff5bb8a1cfcdf8e60c8f5c794d0bbbc234437 upstream. A peer can notify the other one that a subflow has to be treated as "backup" by two different ways: either by sending a dedicated MP_PRIO notification, or by setting the backup flag in the MP_JOIN handshake. The selftests were previously monitoring the former, but not the latter. This is what is now done here by looking at these new MIB counters when validating the 'backup' cases: MPTcpExtMPJoinSynBackupRx MPTcpExtMPJoinSynAckBackupRx The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it will help to validate a new fix for an issue introduced by this commit ID. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-11	selftests: mptcp: always close input's FD if opened	Liu Jing
	commit 7c70bcc2a84cf925f655ea1ac4b8088062b144a3 upstream. In main_loop_s function, when the open(cfg_input, O_RDONLY) function is run, the last fd is not closed if the "--cfg_repeat > 0" branch is not taken. Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: stable@vger.kernel.org Signed-off-by: Liu Jing <liujing@cmss.chinamobile.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-11	perf tool: fix dereferencing NULL al->maps	Casey Chen
	[ Upstream commit 4c17736689ccfc44ec7dcc472577f25c34cf8724 ] With 0dd5041c9a0e ("perf addr_location: Add init/exit/copy functions"), when cpumode is 3 (macro PERF_RECORD_MISC_HYPERVISOR), thread__find_map() could return with al->maps being NULL. The path below could add a callchain_cursor_node with NULL ms.maps. add_callchain_ip() thread__find_symbol(.., &al) thread__find_map(.., &al) // al->maps becomes NULL ms.maps = maps__get(al.maps) callchain_cursor_append(..., &ms, ...) node->ms.maps = maps__get(ms->maps) Then the path below would dereference NULL maps and get segfault. fill_callchain_info() maps__machine(node->ms.maps); Fix it by checking if maps is NULL in fill_callchain_info(). Fixes: 0dd5041c9a0e ("perf addr_location: Add init/exit/copy functions") Signed-off-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: yzhong@purestorage.com Link: https://lore.kernel.org/r/20240722211548.61455-1-cachen@purestorage.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	tools/resolve_btfids: Fix comparison of distinct pointer types warning in ↵	Liwei Song
	resolve_btfids [ Upstream commit 13c9b702e6cb8e406d5fa6b2dca422fa42d2f13e ] Add a type cast for set8->pairs to fix below compile warning: main.c: In function 'sets_patch': main.c:699:50: warning: comparison of distinct pointer types lacks a cast 699 \| BUILD_BUG_ON(set8->pairs != &set8->pairs[0].id); \| ^~ Fixes: 9707ac4fe2f5 ("tools/resolve_btfids: Refactor set sorting with types from btf_ids.h") Signed-off-by: Liwei Song <liwei.song.lsong@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240722083305.4009723-1-liwei.song.lsong@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	libbpf: Fix no-args func prototype BTF dumping syntax	Andrii Nakryiko
	[ Upstream commit 189f1a976e426011e6a5588f1d3ceedf71fe2965 ] For all these years libbpf's BTF dumper has been emitting not strictly valid syntax for function prototypes that have no input arguments. Instead of `int (blah)()` we should emit `int (blah)(void)`. This is not normally a problem, but it manifests when we get kfuncs in vmlinux.h that have no input arguments. Due to compiler internal specifics, we get no BTF information for such kfuncs, if they are not declared with proper `(void)`. The fix is trivial. We also need to adjust a few ancient tests that happily assumed `()` is correct. Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion") Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://lore.kernel.org/bpf/20240712224442.282823-1-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	selftests/bpf: fexit_sleep: Fix stack allocation for arm64	Puranjay Mohan
	[ Upstream commit e1ef78dce9b7b0fa7f9d88bb3554441d74d33b34 ] On ARM64 the stack pointer should be aligned at a 16 byte boundary or the SPAlignmentFault can occur. The fexit_sleep selftest allocates the stack for the child process as a character array, this is not guaranteed to be aligned at 16 bytes. Because of the SPAlignmentFault, the child process is killed before it can do the nanosleep call and hence fentry_cnt remains as 0. This causes the main thread to hang on the following line: while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2); Fix this by allocating the stack using mmap() as described in the example in the man page of clone(). Remove the fexit_sleep test from the DENYLIST of arm64. Fixes: eddbe8e65214 ("selftest/bpf: Add a test to check trampoline freeing logic.") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240715173327.8657-1-puranjay@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	selftests/sigaltstack: Fix ppc64 GCC build	Michael Ellerman
	commit 17c743b9da9e0d073ff19fd5313f521744514939 upstream. Building the sigaltstack test with GCC on 64-bit powerpc errors with: gcc -Wall sas.c -o /home/michael/linux/.build/kselftest/sigaltstack/sas In file included from sas.c:23: current_stack_pointer.h:22:2: error: #error "implement current_stack_pointer equivalent" 22 \| #error "implement current_stack_pointer equivalent" \| ^~~~~ sas.c: In function ‘my_usr1’: sas.c:50:13: error: ‘sp’ undeclared (first use in this function); did you mean ‘p’? 50 \| if (sp < (unsigned long)sstack \|\| \| ^~ This happens because GCC doesn't define __ppc__ for 64-bit builds, only 32-bit builds. Instead use __powerpc__ to detect powerpc builds, which is defined by clang and GCC for 64-bit and 32-bit builds. Fixes: 05107edc9101 ("selftests: sigaltstack: fix -Wuninitialized") Cc: stable@vger.kernel.org # v6.3+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240520062647.688667-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03	perf stat: Fix the hard-coded metrics calculation on the hybrid	Kan Liang
	commit 3612ca8e2935c4c142d99e33b8effa7045ce32b5 upstream. The hard-coded metrics is wrongly calculated on the hybrid machine. $ perf stat -e cycles,instructions -a sleep 1 Performance counter stats for 'system wide': 18,205,487 cpu_atom/cycles/ 9,733,603 cpu_core/cycles/ 9,423,111 cpu_atom/instructions/ # 0.52 insn per cycle 4,268,965 cpu_core/instructions/ # 0.23 insn per cycle The insn per cycle for cpu_core should be 4,268,965 / 9,733,603 = 0.44. When finding the metric events, the find_stat() doesn't take the PMU type into account. The cpu_atom/cycles/ is wrongly used to calculate the IPC of the cpu_core. In the hard-coded metrics, the events from a different PMU are only SW_CPU_CLOCK and SW_TASK_CLOCK. They both have the stat type, STAT_NSECS. Except the SW CLOCK events, check the PMU type as well. Fixes: 0a57b910807a ("perf stat: Use counts rather than saved_value") Reported-by: Khalil, Amiri <amiri.khalil@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240606180316.4122904-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03	tools/memory-model: Fix bug in lock.cat	Alan Stern
	commit 4c830eef806679dc243e191f962c488dd9d00708 upstream. Andrea reported that the following innocuous litmus test: C T {} P0(spinlock_t *x) { int r0; spin_lock(x); spin_unlock(x); r0 = spin_is_locked(x); } gives rise to a nonsensical empty result with no executions: $ herd7 -conf linux-kernel.cfg T.litmus Test T Required States 0 Ok Witnesses Positive: 0 Negative: 0 Condition forall (true) Observation T Never 0 0 Time T 0.00 Hash=6fa204e139ddddf2cb6fa963bad117c0 The problem is caused by a bug in the lock.cat part of the LKMM. Its computation of the rf relation for RU (read-unlocked) events is faulty; it implicitly assumes that every RU event must read from either a UL (unlock) event in another thread or from the lock's initial state. Neither is true in the litmus test above, so the computation yields no possible executions. The lock.cat code tries to make up for this deficiency by allowing RU events outside of critical sections to read from the last po-previous UL event. But it does this incorrectly, trying to keep these rfi links separate from the rfe links that might also be needed, and passing only the latter to herd7's cross() macro. The problem is fixed by merging the two sets of possible rf links for RU events and using them all in the call to cross(). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Andrea Parri <parri.andrea@gmail.com> Closes: https://lore.kernel.org/linux-arch/ZlC0IkzpQdeGj+a3@andrea/ Tested-by: Andrea Parri <parri.andrea@gmail.com> Acked-by: Andrea Parri <parri.andrea@gmail.com> Fixes: 15553dcbca06 ("tools/memory-model: Add model support for spin_is_locked()") CC: <stable@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03	selftests/landlock: Add cred_transfer test	Mickaël Salaün
	commit cc374782b6ca0fd634482391da977542443d3368 upstream. Check that keyctl(KEYCTL_SESSION_TO_PARENT) preserves the parent's restrictions. Fixes: e1199815b47b ("selftests/landlock: Add user space tests") Co-developed-by: Jann Horn <jannh@google.com> Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240724.Ood5aige9she@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03	ipv4: Fix incorrect TOS in fibmatch route get reply	Ido Schimmel
	[ Upstream commit f036e68212c11e5a7edbb59b5e25299341829485 ] The TOS value that is returned to user space in the route get reply is the one with which the lookup was performed ('fl4->flowi4_tos'). This is fine when the matched route is configured with a TOS as it would not match if its TOS value did not match the one with which the lookup was performed. However, matching on TOS is only performed when the route's TOS is not zero. It is therefore possible to have the kernel incorrectly return a non-zero TOS: # ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get fibmatch 192.0.2.2 tos 0xfc 192.0.2.0/24 tos 0x1c dev dummy1 proto kernel scope link src 192.0.2.1 Fix by instead returning the DSCP field from the FIB result structure which was populated during the route lookup. Output after the patch: # ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get fibmatch 192.0.2.2 tos 0xfc 192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1 Extend the existing selftests to not only verify that the correct route is returned, but that it is also returned with correct "tos" value (or without it). Fixes: b61798130f1b ("net: ipv4: RTM_GETROUTE: return matched fib result when requested") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	perf intel-pt: Fix exclude_guest setting	Adrian Hunter
	[ Upstream commit b40934ae32232140e85dc7dc1c3ea0e296986723 ] In the past, the exclude_guest setting has had no effect on Intel PT tracing, but that may not be the case in the future. Set the flag correctly based upon whether KVM is using Intel PT "Host/Guest" mode, which is determined by the kvm_intel module parameter pt_mode: pt_mode=0 System-wide mode : host and guest output to host buffer pt_mode=1 Host/Guest mode : host/guest output to host/guest buffers respectively Fixes: 6e86bfdc4a60 ("perf intel-pt: Support decoding of guest kernel") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240625104532.11990-3-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	perf intel-pt: Fix aux_watermark calculation for 64-bit size	Adrian Hunter
	[ Upstream commit 36b4cd990a8fd3f5b748883050e9d8c69fe6398d ] aux_watermark is a u32. For a 64-bit size, cap the aux_watermark calculation at UINT_MAX instead of truncating it to 32-bits. Fixes: 874fc35cdd55 ("perf intel-pt: Use aux_watermark") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240625104532.11990-2-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	perf report: Fix condition in sort__sym_cmp()	Namhyung Kim
	[ Upstream commit cb39d05e67dc24985ff9f5150e71040fa4d60ab8 ] It's expected that both hist entries are in the same hists when comparing two. But the current code in the function checks one without dso sort key and other with the key. This would make the condition true in any case. I guess the intention of the original commit was to add '!' for the right side too. But as it should be the same, let's just remove it. Fixes: 69849fc5d2119 ("perf hists: Move sort__has_dso into struct perf_hpp_list") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240621170528.608772-2-namhyung@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	perf pmus: Fixes always false when compare duplicates aliases	Junhao He
	[ Upstream commit dd9a426eade634bf794c7e0f1b0c6659f556942f ] In the previous loop, all the members in the aliases[j-1] have been freed and set to NULL. But in this loop, the function pmu_alias_is_duplicate() compares the aliases[j] with the aliases[j-1] that has already been disposed, so the function will always return false and duplicate aliases will never be discarded. If we find duplicate aliases, it skips the zfree aliases[j], which is accompanied by a memory leak. We can use the next aliases[j+1] to theck for duplicate aliases to fixes the aliases NULL pointer dereference, then goto zfree code snippet to release it. After patch testing: $ perf list --unit=hisi_sicl,cpa pmu uncore cpa: cpa_p0_rd_dat_32b [Number of read ops transmitted by the P0 port which size is 32 bytes. Unit: hisi_sicl,cpa] cpa_p0_rd_dat_64b [Number of read ops transmitted by the P0 port which size is 64 bytes. Unit: hisi_sicl,cpa] Fixes: c3245d2093c1 ("perf pmu: Abstract alias/event struct") Signed-off-by: Junhao He <hejunhao3@huawei.com> Cc: ravi.bangoria@amd.com Cc: james.clark@arm.com Cc: prime.zeng@hisilicon.com Cc: cuigaosheng1@huawei.com Cc: jonathan.cameron@huawei.com Cc: linuxarm@huawei.com Cc: yangyicong@huawei.com Cc: robh@kernel.org Cc: renyu.zj@linux.alibaba.com Cc: kjain@linux.ibm.com Cc: john.g.garry@oracle.com Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240614094318.11607-1-hejunhao3@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	perf test: Make test_arm_callgraph_fp.sh more robust	James Clark
	[ Upstream commit ff16aeb9b83441b8458d4235496cf320189a0c60 ] The 2 second sleep can cause the test to fail on very slow network file systems because Perf ends up being killed before it finishes starting up. Fix it by making the leafloop workload end after a fixed time like the other workloads so there is no need to kill it after 2 seconds. Also remove the 1 second start sampling delay because it is similarly fragile. Instead, search through all samples for a matching one, rather than just checking the first sample and hoping it's in the right place. Fixes: cd6382d82752 ("perf test arm64: Test unwinding using fame-pointer (fp) mode") Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: German Gomez <german.gomez@arm.com> Cc: Spoorthy S <spoorts2@in.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240612140316.3006660-1-james.clark@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	selftests: forwarding: devlink_lib: Wait for udev events after reloading	Amit Cohen
	[ Upstream commit f67a90a0c8f5b3d0acc18f10650d90fec44775f9 ] Lately, an additional locking was added by commit c0a40097f0bc ("drivers: core: synchronize really_probe() and dev_uevent()"). The locking protects dev_uevent() calling. This function is used to send messages from the kernel to user space. Uevent messages notify user space about changes in device states, such as when a device is added, removed, or changed. These messages are used by udev (or other similar user-space tools) to apply device-specific rules. After reloading devlink instance, udev events should be processed. This locking causes a short delay of udev events handling. One example for useful udev rule is renaming ports. 'forwading.config' can be configured to use names after udev rules are applied. Some tests run devlink_reload() and immediately use the updated names. This worked before the above mentioned commit was pushed, but now the delay of uevent messages causes that devlink_reload() returns before udev events are handled and tests fail. Adjust devlink_reload() to not assume that udev events are already processed when devlink reload is done, instead, wait for udev events to ensure they are processed before returning from the function. Without this patch: TESTS='rif_mac_profile' ./resource_scale.sh TEST: 'rif_mac_profile' 4 [ OK ] sysctl: cannot stat /proc/sys/net/ipv6/conf/swp1/disable_ipv6: No such file or directory sysctl: cannot stat /proc/sys/net/ipv6/conf/swp1/disable_ipv6: No such file or directory sysctl: cannot stat /proc/sys/net/ipv6/conf/swp2/disable_ipv6: No such file or directory sysctl: cannot stat /proc/sys/net/ipv6/conf/swp2/disable_ipv6: No such file or directory Cannot find device "swp1" Cannot find device "swp2" TEST: setup_wait_dev (: Interface swp1 does not come up.) [FAIL] With this patch: $ TESTS='rif_mac_profile' ./resource_scale.sh TEST: 'rif_mac_profile' 4 [ OK ] TEST: 'rif_mac_profile' overflow 5 [ OK ] This is relevant not only for this test. Fixes: bc7cbb1e9f4c ("selftests: forwarding: Add devlink_lib.sh") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/89367666e04b38a8993027f1526801ca327ab96a.1720709333.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	selftests/resctrl: Fix closing IMC fds on error and open-code R+W instead of ↵	Ilpo Järvinen
	loops [ Upstream commit c44000b6535dc9806b9128d1aed403862b2adab9 ] The imc perf fd close() calls are missing from all error paths. In addition, get_mem_bw_imc() handles fds in a for loop but close() is based on two fixed indexes READ and WRITE. Open code inner for loops to READ+WRITE entries for clarity and add a function to close() IMC fds properly in all cases. Fixes: 7f4d257e3a2a ("selftests/resctrl: Add callback to start a benchmark") Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	selftests/resctrl: Convert perror() to ksft_perror() or ksft_print_msg()	Ilpo Järvinen
	[ Upstream commit cc8ff7f5c85c076297b18fb9f6d45ec5569d3d44 ] The resctrl selftest code contains a number of perror() calls. Some of them come with hash character and some don't. The kselftest framework provides ksft_perror() that is compatible with test output formatting so it should be used instead of adding custom hash signs. Some perror() calls are too far away from anything that sets error. For those call sites, ksft_print_msg() must be used instead. Convert perror() to ksft_perror() or ksft_print_msg(). Other related changes: - Remove hash signs - Remove trailing stops & newlines from ksft_perror() - Add terminating newlines for converted ksft_print_msg() - Use consistent capitalization - Small fixes/tweaks to typos & grammar of the messages - Extract error printing out of PARENT_EXIT() to be able to differentiate Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Stable-dep-of: c44000b6535d ("selftests/resctrl: Fix closing IMC fds on error and open-code R+W instead of loops") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	selftests/resctrl: Move run_benchmark() to a more fitting file	Maciej Wieczor-Retman
	[ Upstream commit 508934b5d15ab79fd5895cc2a6063bc9d95f6a55 ] resctrlfs.c contains mostly functions that interact in some way with resctrl FS entries while functions inside resctrl_val.c deal with measurements and benchmarking. run_benchmark() is located in resctrlfs.c even though it's purpose is not interacting with the resctrl FS but to execute cache checking logic. Move run_benchmark() to resctrl_val.c just before resctrl_val() that makes use of run_benchmark(). Make run_benchmark() static since it's not used between multiple files anymore. Remove return comment from kernel-doc since the function is type void. Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Stable-dep-of: c44000b6535d ("selftests/resctrl: Fix closing IMC fds on error and open-code R+W instead of loops") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	selftests/bpf: Close obj in error path in xdp_adjust_tail	Geliang Tang
	[ Upstream commit 52b49ec1b2c78deb258596c3b231201445ef5380 ] If bpf_object__load() fails in test_xdp_adjust_frags_tail_grow(), "obj" opened before this should be closed. So use "goto out" to close it instead of using "return" here. Fixes: 110221081aac ("bpf: selftests: update xdp_adjust_tail selftest to include xdp frags") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/f282a1ed2d0e3fb38cceefec8e81cabb69cab260.1720615848.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	selftests/bpf: Null checks for links in bpf_tcp_ca	Geliang Tang
	[ Upstream commit eef0532e900c20a6760da829e82dac3ee18688c5 ] Run bpf_tcp_ca selftests (./test_progs -t bpf_tcp_ca) on a Loongarch platform, some "Segmentation fault" errors occur: ''' test_dctcp:PASS:bpf_dctcp__open_and_load 0 nsec test_dctcp:FAIL:bpf_map__attach_struct_ops unexpected error: -524 #29/1 bpf_tcp_ca/dctcp:FAIL test_cubic:PASS:bpf_cubic__open_and_load 0 nsec test_cubic:FAIL:bpf_map__attach_struct_ops unexpected error: -524 #29/2 bpf_tcp_ca/cubic:FAIL test_dctcp_fallback:PASS:dctcp_skel 0 nsec test_dctcp_fallback:PASS:bpf_dctcp__load 0 nsec test_dctcp_fallback:FAIL:dctcp link unexpected error: -524 #29/4 bpf_tcp_ca/dctcp_fallback:FAIL test_write_sk_pacing:PASS:open_and_load 0 nsec test_write_sk_pacing:FAIL:attach_struct_ops unexpected error: -524 #29/6 bpf_tcp_ca/write_sk_pacing:FAIL test_update_ca:PASS:open 0 nsec test_update_ca:FAIL:attach_struct_ops unexpected error: -524 settcpca:FAIL:setsockopt unexpected setsockopt: \ actual -1 == expected -1 (network_helpers.c:99: errno: No such file or directory) \ Failed to call post_socket_cb start_test:FAIL:start_server_str unexpected start_server_str: \ actual -1 == expected -1 test_update_ca:FAIL:ca1_ca1_cnt unexpected ca1_ca1_cnt: \ actual 0 <= expected 0 #29/9 bpf_tcp_ca/update_ca:FAIL #29 bpf_tcp_ca:FAIL Caught signal #11! Stack trace: ./test_progs(crash_handler+0x28)[0x5555567ed91c] linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x7ffffee408b0] ./test_progs(bpf_link__update_map+0x80)[0x555556824a78] ./test_progs(+0x94d68)[0x5555564c4d68] ./test_progs(test_bpf_tcp_ca+0xe8)[0x5555564c6a88] ./test_progs(+0x3bde54)[0x5555567ede54] ./test_progs(main+0x61c)[0x5555567efd54] /usr/lib64/libc.so.6(+0x22208)[0x7ffff2aaa208] /usr/lib64/libc.so.6(__libc_start_main+0xac)[0x7ffff2aaa30c] ./test_progs(_start+0x48)[0x55555646bca8] Segmentation fault ''' This is because BPF trampoline is not implemented on Loongarch yet, "link" returned by bpf_map__attach_struct_ops() is NULL. test_progs crashs when this NULL link passes to bpf_link__update_map(). This patch adds NULL checks for all links in bpf_tcp_ca to fix these errors. If "link" is NULL, goto the newly added label "out" to destroy the skel. v2: - use "goto out" instead of "return" as Eduard suggested. Fixes: 06da9f3bd641 ("selftests/bpf: Test switching TCP Congestion Control algorithms.") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/r/b4c841492bd4ed97964e4e61e92827ce51bf1dc9.1720615848.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	selftests/bpf: Close fd in error path in drop_on_reuseport	Geliang Tang
	[ Upstream commit adae187ebedcd95d02f045bc37dfecfd5b29434b ] In the error path when update_lookup_map() fails in drop_on_reuseport in prog_tests/sk_lookup.c, "server1", the fd of server 1, should be closed. This patch fixes this by using "goto close_srv1" lable instead of "detach" to close "server1" in this case. Fixes: 0ab5539f8584 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point") Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/86aed33b4b0ea3f04497c757845cff7e8e621a2d.1720515893.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	bpftool: Mount bpffs when pinmaps path not under the bpffs	Tao Chen
	[ Upstream commit da5f8fd1f0d393d5eaaba9ad8c22d1c26bb2bf9b ] As Quentin said [0], BPF map pinning will fail if the pinmaps path is not under the bpffs, like: libbpf: specified path /home/ubuntu/test/sock_ops_map is not on BPF FS Error: failed to pin all maps [0] https://github.com/libbpf/bpftool/issues/146 Fixes: 3767a94b3253 ("bpftool: add pinmaps argument to the load/loadall") Signed-off-by: Tao Chen <chen.dylane@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Quentin Monnet <qmo@kernel.org> Reviewed-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/bpf/20240702131150.15622-1-chen.dylane@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	libbpf: Checking the btf_type kind when fixing variable offsets	Donglin Peng
	[ Upstream commit cc5083d1f3881624ad2de1f3cbb3a07e152cb254 ] I encountered an issue when building the test_progs from the repository [1]: $ pwd /work/Qemu/x86_64/linux-6.10-rc2/tools/testing/selftests/bpf/ $ make test_progs V=1 [...] ./tools/sbin/bpftool gen object ./ip_check_defrag.bpf.linked2.o ./ip_check_defrag.bpf.linked1.o libbpf: failed to find symbol for variable 'bpf_dynptr_slice' in section '.ksyms' Error: failed to link './ip_check_defrag.bpf.linked1.o': No such file or directory (2) [...] Upon investigation, I discovered that the btf_types referenced in the '.ksyms' section had a kind of BTF_KIND_FUNC instead of BTF_KIND_VAR: $ bpftool btf dump file ./ip_check_defrag.bpf.linked1.o [...] [2] DATASEC '.ksyms' size=0 vlen=2 type_id=16 offset=0 size=0 (FUNC 'bpf_dynptr_from_skb') type_id=17 offset=0 size=0 (FUNC 'bpf_dynptr_slice') [...] [16] FUNC 'bpf_dynptr_from_skb' type_id=82 linkage=extern [17] FUNC 'bpf_dynptr_slice' type_id=85 linkage=extern [...] For a detailed analysis, please refer to [2]. We can add a kind checking to fix the issue. [1] https://github.com/eddyz87/bpf/tree/binsort-btf-dedup [2] https://lore.kernel.org/all/0c0ef20c-c05e-4db9-bad7-2cbc0d6dfae7@oracle.com/ Fixes: 8fd27bf69b86 ("libbpf: Add BPF static linker BTF and BTF.ext support") Signed-off-by: Donglin Peng <dolinux.peng@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240619122355.426405-1-dolinux.peng@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors	Ido Schimmel
	[ Upstream commit 75d8d7a63065b18df9555dbaab0b42d4c6f20943 ] ACLs that reside in the algorithmic TCAM (A-TCAM) in Spectrum-2 and newer ASICs can share the same mask if their masks only differ in up to 8 consecutive bits. For example, consider the following filters: # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 192.0.2.0/24 action drop # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 198.51.100.128/25 action drop The second filter can use the same mask as the first (dst_ip/24) with a delta of 1 bit. However, the above only works because the two filters have different values in the common unmasked part (dst_ip/24). When entries have the same value in the common unmasked part they create undesired collisions in the device since many entries now have the same key. This leads to firmware errors such as [1] and to a reduced scale. Fix by adjusting the hash table key to only include the value in the common unmasked part. That is, without including the delta bits. That way the driver will detect the collision during filter insertion and spill the filter into the circuit TCAM (C-TCAM). Add a test case that fails without the fix and adjust existing cases that check C-TCAM spillage according to the above limitation. [1] mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=3379b18a00003394,reg_id=3027(ptce3),type=write,status=8(resource not available)) Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP") Reported-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	selftests/bpf: Check length of recv in test_sockmap	Geliang Tang
	[ Upstream commit de1b5ea789dc28066cc8dc634b6825bd6148f38b ] The value of recv in msg_loop may be negative, like EWOULDBLOCK, so it's necessary to check if it is positive before accumulating it to bytes_recvd. Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/5172563f7c7b2a2e953cef02e89fc34664a7b190.1716446893.git.tanggeliang@kylinos.cn Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	selftests/bpf: Fix prog numbers in test_sockmap	Geliang Tang
	[ Upstream commit 6c8d7598dfed759bf1d9d0322b4c2b42eb7252d8 ] bpf_prog5 and bpf_prog7 are removed from progs/test_sockmap_kern.h in commit d79a32129b21 ("bpf: Selftests, remove prints from sockmap tests"), now there are only 9 progs in it, not 11: SEC("sk_skb1") int bpf_prog1(struct __sk_buff skb) SEC("sk_skb2") int bpf_prog2(struct __sk_buff skb) SEC("sk_skb3") int bpf_prog3(struct __sk_buff skb) SEC("sockops") int bpf_sockmap(struct bpf_sock_ops skops) SEC("sk_msg1") int bpf_prog4(struct sk_msg_md msg) SEC("sk_msg2") int bpf_prog6(struct sk_msg_md msg) SEC("sk_msg3") int bpf_prog8(struct sk_msg_md msg) SEC("sk_msg4") int bpf_prog9(struct sk_msg_md msg) SEC("sk_msg5") int bpf_prog10(struct sk_msg_md *msg) This patch updates the array sizes of prog_fd[], prog_attach_type[] and prog_type[] from 11 to 9 accordingly. Fixes: d79a32129b21 ("bpf: Selftests, remove prints from sockmap tests") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/9c10d9f974f07fcb354a43a8eca67acb2fafc587.1715926605.git.tanggeliang@kylinos.cn Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03	bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer	Ivan Babrou
	[ Upstream commit f4aba3471cfb9ccf69b476463f19b4c50fef6b14 ] LLVM 17 started treating const structs as constants: * https://github.com/llvm/llvm-project/commit/0b2d5b967d98 Combined with pointer laundering via ptr_to_u64, which takes a const ptr, but in reality treats the underlying memory as mutable, this makes clang always pass zero to btf__type_by_id, which breaks full name resolution. Disassembly before (LLVM 16) and after (LLVM 17): - 8b 75 cc mov -0x34(%rbp),%esi - e8 47 8d 02 00 call 3f5b0 <btf__type_by_id> + 31 f6 xor %esi,%esi + e8 a9 8c 02 00 call 3f510 <btf__type_by_id> It's a bigger project to fix this properly (and a question whether LLVM itself should detect this), but for right now let's just fix bpftool. For more information, see this thread in bpf mailing list: * https://lore.kernel.org/bpf/CABWYdi0ymezpYsQsPv7qzpx2fWuTkoD1-wG1eT-9x-TSREFrQg@mail.gmail.com/T/ Fixes: b662000aff84 ("bpftool: Adding support for BTF program names") Signed-off-by: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240520225149.5517-1-ivan@cloudflare.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-25	selftests/bpf: Extend tcx tests to cover late tcx_entry release	Daniel Borkmann
	[ Upstream commit 5f1d18de79180deac2822c93e431bbe547f7d3ce ] Add a test case which replaces an active ingress qdisc while keeping the miniq in-tact during the transition period to the new clsact qdisc. # ./vmtest.sh -- ./test_progs -t tc_link [...] ./test_progs -t tc_link [ 3.412871] bpf_testmod: loading out-of-tree module taints kernel. [ 3.413343] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #332 tc_links_after:OK #333 tc_links_append:OK #334 tc_links_basic:OK #335 tc_links_before:OK #336 tc_links_chain_classic:OK #337 tc_links_chain_mixed:OK #338 tc_links_dev_chain0:OK #339 tc_links_dev_cleanup:OK #340 tc_links_dev_mixed:OK #341 tc_links_ingress:OK #342 tc_links_invalid:OK #343 tc_links_prepend:OK #344 tc_links_replace:OK #345 tc_links_revision:OK Summary: 14/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240708133130.11609-2-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-25	selftests/vDSO: fix clang build errors and warnings	John Hubbard
	[ Upstream commit 73810cd45b99c6c418e1c6a487b52c1e74edb20d ] When building with clang, via: make LLVM=1 -C tools/testing/selftests ...there are several warnings, and an error. This fixes all of those and allows these tests to run and pass. 1. Fix linker error (undefined reference to memcpy) by providing a local version of memcpy. 2. clang complains about using this form: if (g = h & 0xf0000000) ...so factor out the assignment into a separate step. 3. The code is passing a signed const char* to elf_hash(), which expects a const unsigned char *. There are several callers, so fix this at the source by allowing the function to accept a signed argument, and then converting to unsigned operations, once inside the function. 4. clang doesn't have __attribute__((externally_visible)) and generates a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem to require that attribute in order to build, run and pass tests here, so remove it. Reviewed-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Edward Liaw <edliaw@google.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-25	selftest/timerns: fix clang build failures for abs() calls	John Hubbard
	[ Upstream commit f76f9bc616b7320df6789241ca7d26cedcf03cf3 ] When building with clang, via: make LLVM=1 -C tools/testing/selftests ...clang warns about mismatches between the expected and required integer length being supplied to abs(3). Fix this by using the correct variant of abs(3): labs(3) or llabs(3), in these cases. Reviewed-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Andrei Vagin <avagin@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-25	selftests: openvswitch: Set value to nla flags.	Adrian Moreno
	[ Upstream commit a8763466669d21b570b26160d0a5e0a2ee529d22 ] Netlink flags, although they don't have payload at the netlink level, are represented as having "True" as value in pyroute2. Without it, trying to add a flow with a flag-type action (e.g: pop_vlan) fails with the following traceback: Traceback (most recent call last): File "[...]/ovs-dpctl.py", line 2498, in <module> sys.exit(main(sys.argv)) ^^^^^^^^^^^^^^ File "[...]/ovs-dpctl.py", line 2487, in main ovsflow.add_flow(rep["dpifindex"], flow) File "[...]/ovs-dpctl.py", line 2136, in add_flow reply = self.nlm_request( ^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request return tuple(self._genlm_request(argv, kwarg)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in nlm_request return tuple(super().nlm_request(argv, **kwarg)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request self.put(msg, msg_type, msg_flags, msg_seq=msg_seq) File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put self.sendto_gate(msg, addr) File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate msg.encode() File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode offset = self.encode_nlas(offset) ^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas nla_instance.setvalue(cell[1]) File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue nlv.setvalue(nla_tuple[1]) ~~~~~~~~~^^^ IndexError: list index out of range Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-25	selftests/futex: pass _GNU_SOURCE without a value to the compiler	John Hubbard
	[ Upstream commit cb708ab9f584f159798b60853edcf0c8b67ce295 ] It's slightly better to set _GNU_SOURCE in the source code, but if one must do it via the compiler invocation, then the best way to do so is this: $(CC) -D_GNU_SOURCE= ...because otherwise, if this form is used: $(CC) -D_GNU_SOURCE ...then that leads the compiler to set a value, as if you had passed in: $(CC) -D_GNU_SOURCE=1 That, in turn, leads to warnings under both gcc and clang, like this: futex_requeue_pi.c:20: warning: "_GNU_SOURCE" redefined Fix this by using the "-D_GNU_SOURCE=" form. Reviewed-by: Edward Liaw <edliaw@google.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-25	selftests/openat2: Fix build warnings on ppc64	Michael Ellerman
	[ Upstream commit 84b6df4c49a1cc2854a16937acd5fd3e6315d083 ] Fix warnings like: openat2_test.c: In function ‘test_openat2_flags’: openat2_test.c:303:73: warning: format ‘%llX’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] By switching to unsigned long long for u64 for ppc64 builds. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-25	selftests: cachestat: Fix build warnings on ppc64	Michael Ellerman
	[ Upstream commit bc4d5f5d2debf8bb65fba188313481549ead8576 ] Fix warnings like: test_cachestat.c: In function ‘print_cachestat’: test_cachestat.c:30:38: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] By switching to unsigned long long for u64 for ppc64 builds. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-25	tools/power/cpupower: Fix Pstate frequency reporting on AMD Family 1Ah CPUs	Dhananjay Ugwekar
	[ Upstream commit 43cad521c6d228ea0c51e248f8e5b3a6295a2849 ] Update cpupower's P-State frequency calculation and reporting with AMD Family 1Ah+ processors, when using the acpi-cpufreq driver. This is due to a change in the PStateDef MSR layout in AMD Family 1Ah+. Tested on 4th and 5th Gen AMD EPYC system Signed-off-by: Ananth Narayan <Ananth.Narayan@amd.com> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-18	selftests/net: fix gro.c compilation failure due to non-existent opt_ipproto_off	John Hubbard
	Linux 6.6 does not have an opt_ipproto_off variable in gro.c at all (it was added in later kernel versions), so attempting to initialize one breaks the build. Fixes: c80d53c484e8 ("selftests/net: fix uninitialized variables") Cc: <stable@vger.kernel.org> # 6.6 Reported-by: Ignat Korchagin <ignat@cloudflare.com> Closes: https://lore.kernel.org/all/8B1717DB-8C4A-47EE-B28C-170B630C4639@cloudflare.com/#t Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-18	wireguard: selftests: use acpi=off instead of -no-acpi for recent QEMU	Jason A. Donenfeld
	commit 2cb489eb8dfc291060516df313ff31f4f9f3d794 upstream. QEMU 9.0 removed -no-acpi, in favor of machine properties, so update the Makefile to use the correct QEMU invocation. Cc: stable@vger.kernel.org Fixes: b83fdcd9fb8a ("wireguard: selftests: use microvm on x86") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20240704154517.1572127-2-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-11	selftests: make order checking verbose in msg_zerocopy selftest	Zijian Zhang
	[ Upstream commit 7d6d8f0c8b700c9493f2839abccb6d29028b4219 ] We find that when lock debugging is on, notifications may not come in order. Thus, we have order checking outputs managed by cfg_verbose, to avoid too many outputs in this case. Fixes: 07b65c5b31ce ("test: add msg_zerocopy test") Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240701225349.3395580-3-zijianzhang@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-11	selftests: fix OOM in msg_zerocopy selftest	Zijian Zhang
	[ Upstream commit af2b7e5b741aaae9ffbba2c660def434e07aa241 ] In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg on a socket with MSG_ZEROCOPY flag, and it will recv the notifications until the socket is not writable. Typically, it will start the receiving process after around 30+ sendmsgs. However, as the introduction of commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is always writable and does not get any chance to run recv notifications. The selftest always exits with OUT_OF_MEMORY because the memory used by opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a different value to trigger OOM on older kernels too. Thus, we introduce "cfg_notification_limit" to force sender to receive notifications after some number of sendmsgs. Fixes: 07b65c5b31ce ("test: add msg_zerocopy test") Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240701225349.3395580-2-zijianzhang@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-11	tools/power turbostat: Remember global max_die_id	Len Brown
	[ Upstream commit cda203388687aa075db6f8996c3c4549fa518ea8 ] This is necessary to gracefully handle sparse die_id's. no functional change Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-11	bpf: Avoid uninitialized value in BPF_CORE_READ_BITFIELD	Jose E. Marchesi
	[ Upstream commit 009367099eb61a4fc2af44d4eb06b6b4de7de6db ] [Changes from V1: - Use a default branch in the switch statement to initialize `val'.] GCC warns that `val' may be used uninitialized in the BPF_CRE_READ_BITFIELD macro, defined in bpf_core_read.h as: [...] unsigned long long val; \ [...] \ switch (__CORE_RELO(s, field, BYTE_SIZE)) { \ case 1: val = (const unsigned char )p; break; \ case 2: val = (const unsigned short )p; break; \ case 4: val = (const unsigned int )p; break; \ case 8: val = (const unsigned long long )p; break; \ } \ [...] val; \ } \ This patch adds a default entry in the switch statement that sets `val' to zero in order to avoid the warning, and random values to be used in case __builtin_preserve_field_info returns unexpected values for BPF_FIELD_BYTE_SIZE. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240508101313.16662-1-jose.marchesi@oracle.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-11	selftests/net: fix uninitialized variables	John Hubbard
	[ Upstream commit eb709b5f6536636dfb87b85ded0b2af9bb6cd9e6 ] When building with clang, via: make LLVM=1 -C tools/testing/selftest ...clang warns about three variables that are not initialized in all cases: 1) The opt_ipproto_off variable is used uninitialized if "testname" is not "ip". Willem de Bruijn pointed out that this is an actual bug, and suggested the fix that I'm using here (thanks!). 2) The addr_len is used uninitialized, but only in the assert case, which bails out, so this is harmless. 3) The family variable in add_listener() is only used uninitialized in the error case (neither IPv4 nor IPv6 is specified), so it's also harmless. Fix by initializing each variable. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20240506190204.28497-1-jhubbard@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-11	selftests/bpf: dummy_st_ops should reject 0 for non-nullable params	Eduard Zingerman
	[ Upstream commit 6a2d30d3c5bf9f088dcfd5f3746b04d84f2fab83 ] Check if BPF_PROG_TEST_RUN for bpf_dummy_struct_ops programs rejects execution if NULL is passed for non-nullable parameter. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240424012821.595216-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-11	selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops	Eduard Zingerman
	[ Upstream commit f612210d456a0b969a0adca91e68dbea0e0ea301 ] dummy_st_ops.test_2 and dummy_st_ops.test_sleepable do not have their 'state' parameter marked as nullable. Update dummy_st_ops.c to avoid passing NULL for such parameters, as the next patch would allow kernel to enforce this restriction. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240424012821.595216-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-11	selftests/bpf: adjust dummy_st_ops_success to detect additional error	Eduard Zingerman
	[ Upstream commit 3b3b84aacb4420226576c9732e7b539ca7b79633 ] As reported by Jose E. Marchesi in off-list discussion, GCC and LLVM generate slightly different code for dummy_st_ops_success/test_1(): SEC("struct_ops/test_1") int BPF_PROG(test_1, struct bpf_dummy_ops_state state) { int ret; if (!state) return 0xf2f3f4f5; ret = state->val; state->val = 0x5a; return ret; } GCC-generated LLVM-generated ---------------------------- --------------------------- 0: r1 = (u64 )(r1 + 0x0) 0: w0 = -0xd0c0b0b 1: if r1 == 0x0 goto 5f 1: r1 = (u64 )(r1 + 0x0) 2: r0 = (s32 )(r1 + 0x0) 2: if r1 == 0x0 goto 6f 3: (u32 )(r1 + 0x0) = 0x5a 3: r0 = (u32 )(r1 + 0x0) 4: exit 4: w2 = 0x5a 5: r0 = -0xd0c0b0b 5: (u32 *)(r1 + 0x0) = r2 6: exit 6: exit If the 'state' argument is not marked as nullable in net/bpf/bpf_dummy_struct_ops.c, the verifier would assume that 'r1 == 0x0' is never true: - for the GCC version, this means that instructions #5-6 would be marked as dead and removed; - for the LLVM version, all instructions would be marked as live. The test dummy_st_ops/dummy_init_ret_value actually sets the 'state' parameter to NULL. Therefore, when the 'state' argument is not marked as nullable, the GCC-generated version of the code would trigger a NULL pointer dereference at instruction #3. This patch updates the test_1() test case to always follow a shape similar to the GCC-generated version above, in order to verify whether the 'state' nullability is marked correctly. Reported-by: Jose E. Marchesi <jemarch@gnu.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240424012821.595216-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05	cxl/region: check interleave capability	Yao Xingtao
	[ Upstream commit 84328c5acebc10c8cdcf17283ab6c6d548885bfc ] Since interleave capability is not verified, if the interleave capability of a target does not match the region need, committing decoder should have failed at the device end. In order to checkout this error as quickly as possible, driver needs to check the interleave capability of target during attaching it to region. Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register), bits 11 and 12 indicate the capability to establish interleaving in 3, 6, 12 and 16 ways. If these bits are not set, the target cannot be attached to a region utilizing such interleave ways. Additionally, bits 8 and 9 represent the capability of the bits used for interleaving in the address, Linux tracks this in the cxl_port interleave_mask. Per CXL specification r3.1(8.2.4.20.13 Decoder Protection): eIW means encoded Interleave Ways. eIG means encoded Interleave Granularity. in HPA: if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used, the interleave bits are none, the following check is ignored. if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits start at bit position eIG + 8 and end at eIG + eIW + 8 - 1. if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits start at bit position eIG + 8 and end at eIG + eIW - 1. if the interleave mask is insufficient to cover the required interleave bits, the target cannot be attached to the region. Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>