linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-11-13	tools: ynl: extend CFLAGS to keep options from environment	Jan Stancek
	Package build environments like Fedora rpmbuild introduced hardening options (e.g. -pie -Wl,-z,now) by passing a -spec option to CFLAGS and LDFLAGS. ynl Makefiles currently override CFLAGS but not LDFLAGS, which leads to a mismatch and build failure: CC sample devlink /usr/bin/ld: devlink.o: relocation R_X86_64_32 against symbol `ynl_devlink_family' can not be used when making a PIE object; recompile with -fPIE /usr/bin/ld: failed to set dynamic section sizes: bad value collect2: error: ld returned 1 exit status Extend CFLAGS to support hardening options set by build environment. Signed-off-by: Jan Stancek <jstancek@redhat.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/265b2d5d3a6d4721a161219f081058ed47dc846a.1731399562.git.jstancek@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-13	tools: ynl: add script dir to sys.path	Jan Stancek
	Python options like PYTHONSAFEPATH or -P [1] do not add script directory to PYTHONPATH. ynl depends on this path to build and run. [1] This option is default for Fedora rpmbuild since introduction of https://fedoraproject.org/wiki/Changes/PythonSafePath Signed-off-by: Jan Stancek <jstancek@redhat.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/b26537cdb6e1b24435b50b2ef81d71f31c630bc1.1731399562.git.jstancek@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14	tools/firewire: Fix several incorrect format specifiers	Luo Yifan
	Make a minor change to eliminate static checker warnings. Fix several incorrect format specifiers that misused signed and unsigned versions. Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241113023137.291661-1-luoyifan@cmss.chinamobile.com Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-11-13	selftests/bpf: Add a test for arena range tree algorithm	Alexei Starovoitov
	Add a test that verifies specific behavior of arena range tree algorithm and adjust existing big_alloc1 test due to use of global data in arena. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20241108025616.17625-3-alexei.starovoitov@gmail.com
2024-11-13	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Alexei Starovoitov
	Cross-merge bpf fixes after downstream PR. In particular to bring the fix in commit aa30eb3260b2 ("bpf: Force checkpoint when jmp history is too long"). The follow up verifier work depends on it. And the fix in commit 6801cf7890f2 ("selftests/bpf: Use -4095 as the bad address for bits iterator"). It's fixing instability of BPF CI on s390 arch. No conflicts. Adjacent changes in: Auto-merging arch/Kconfig Auto-merging kernel/bpf/helpers.c Auto-merging kernel/bpf/memalloc.c Auto-merging kernel/bpf/verifier.c Auto-merging mm/slab_common.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13	bpftool: Cast variable `var` to long long	Luo Yifan
	When the SIGNED condition is met, the variable `var` should be cast to `long long` instead of `unsigned long long`. Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/bpf/20241112073701.283362-1-luoyifan@cmss.chinamobile.com
2024-11-13	libsubcmd: Move va_end() before exit	Luo Yifan
	This patch makes a minor adjustment by moving the va_end call before exit. Since the exit() function terminates the program, any code after exit(128) (i.e., va_end(params)) is unreachable and thus not executed. Placing va_end before exit ensures that the va_list is properly cleaned up. Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20241111091701.275496-1-luoyifan@cmss.chinamobile.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-13	perf timechart: Remove redundant variable assignment	Luo Yifan
	This patch makes a minor change that removes a redundant variable assignment. The assignment before the for loop is duplicated by the initialization within the loop header. Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241111095209.276332-1-luoyifan@cmss.chinamobile.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-13	perf list: Fix topic and pmu_name argument order	Jean-Philippe Romain
	Fix function definitions to match header file declaration. Fix two callers to pass the arguments in the right order. On Intel Tigerlake, before: ``` $ perf list -j\|grep "\"Topic\""\|sort\|uniq "Topic": "cache", "Topic": "cpu", "Topic": "floating point", "Topic": "frontend", "Topic": "memory", "Topic": "other", "Topic": "pfm icl", "Topic": "pfm ix86arch", "Topic": "pfm perf_raw", "Topic": "pipeline", "Topic": "tool", "Topic": "uncore interconnect", "Topic": "uncore memory", "Topic": "uncore other", "Topic": "virtual memory", $ perf list -j\|grep "\"Unit\""\|sort\|uniq "Unit": "cache", "Unit": "cpu", "Unit": "cstate_core", "Unit": "cstate_pkg", "Unit": "i915", "Unit": "icl", "Unit": "intel_bts", "Unit": "intel_pt", "Unit": "ix86arch", "Unit": "msr", "Unit": "perf_raw", "Unit": "power", "Unit": "tool", "Unit": "uncore_arb", "Unit": "uncore_clock", "Unit": "uncore_imc_free_running_0", "Unit": "uncore_imc_free_running_1", ``` After: ``` $ perf list -j\|grep "\"Topic\""\|sort\|uniq "Topic": "cache", "Topic": "floating point", "Topic": "frontend", "Topic": "memory", "Topic": "other", "Topic": "pfm icl", "Topic": "pfm ix86arch", "Topic": "pfm perf_raw", "Topic": "pipeline", "Topic": "tool", "Topic": "uncore interconnect", "Topic": "uncore memory", "Topic": "uncore other", "Topic": "virtual memory", $ perf list -j\|grep "\"Unit\""\|sort\|uniq "Unit": "cpu", "Unit": "cstate_core", "Unit": "cstate_pkg", "Unit": "i915", "Unit": "icl", "Unit": "intel_bts", "Unit": "intel_pt", "Unit": "ix86arch", "Unit": "msr", "Unit": "perf_raw", "Unit": "power", "Unit": "tool", "Unit": "uncore_arb", "Unit": "uncore_clock", "Unit": "uncore_imc_free_running_0", "Unit": "uncore_imc_free_running_1", ``` Fixes: e5c6109f4813246a ("perf list: Reorganize to use callbacks to allow honouring command line options") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Jean-Philippe Romain <jean-philippe.romain@foss.st.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Junhao He <hejunhao3@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241109025801.560378-1-irogers@google.com [ I fixed the two callers and added it to Jean-Phillippe's original change. ] Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-13	perf tools: Fix typos Muliplier -> Multiplier	Andrew Kreimer
	There are some typos in fprintf messages. Fix them via codespell. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Andrew Kreimer <algonell@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108134728.25515-1-algonell@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-13	perf disasm: Allow configuring what disassemblers to use	Arnaldo Carvalho de Melo
	The perf tools annotation code used for a long time parsing the output of binutils's objdump (or its reimplementations, like llvm's) to then parse and augment it with samples, allow navigation, etc. More recently disassemblers from the capstone and llvm (libraries, not parsing the output of tools using those libraries to mimic binutils's objdump output) were introduced. So when all those methods are available, there is a static preference for a series of attempts of disassembling a binary, with the 'llvm, capstone, objdump' sequence being hard coded. This patch allows users to change that sequence, specifying via a 'perf config' 'annotate.disassemblers' entry which and in what order disassemblers should be attempted. As alluded to in the comments in the source code of this series, this flexibility is useful for users and developers alike, elliminating the requirement to rebuild the tool with some specific set of libraries to see how the output of disassembling would be for one of these methods. root@x1:~# rm -f ~/.perfconfig root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> symbol__disassemble: filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux, sym=update_load_avg, start=0xffffffffb6148fe0, en> annotating [0x6ff7170] /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux : [0x7407ca0] update_load_avg Disassembled with llvm annotate.disassemblers=llvm,capstone,objdump Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = capstone root@x1:~# root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> Disassembled with capstone annotate.disassemblers=capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=objdump,capstone root@x1:~# perf config annotate.disassemblers annotate.disassemblers=objdump,capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = objdump,capstone root@x1:~# perf annotate -v --stdio2 update_load_avg Executing: objdump --start-address=0xffffffff81148fe0 \ --stop-address=0xffffffff811497aa \ -d --no-show-raw-insn -S -C "$1" Disassembled with objdump annotate.disassemblers=objdump,capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent Disassembly of section .text: ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 #define DO_DETACH 0x8 /* Update task and its cfs_rq load average / static inline void update_load_avg(struct cfs_rq cfs_rq, struct sched_entity se, int flags) { 1.61 push %r15 push %r14 1.00 push %r13 mov %edx,%r13d 1.90 push %r12 push %rbp mov %rsi,%rbp push %rbx mov %rdi,%rbx sub $0x18,%rsp } / rq->task_clock normalized against any time this cfs_rq has spent throttled / static inline u64 cfs_rq_clock_pelt(struct cfs_rq cfs_rq) { if (unlikely(cfs_rq->throttle_count)) 15.14 mov 0x1a4(%rdi),%eax root@x1:~# After adding a way to select the disassembler from the command line a 'perf test' comparing the output of the various diassemblers should be introduced, to test these codebases. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20241111151734.1018476-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-13	perf disasm: Define stubs for the LLVM and capstone disassemblers	Arnaldo Carvalho de Melo
	This reduces the number of ifdefs in the main symbol__disassemble() method and paves the way for allowing the user to configure the disassemblers of preference. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Aditya Bodkhe <Aditya.Bodkhe1@ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20241111151734.1018476-3-acme@kernel.org [ Applied fixes from Masami Hiramatsu and Aditya Bodkhe for when capstone devel files are not available ] Link: https://lore.kernel.org/r/B78FB6DF-24E9-4A3C-91C9-535765EC0E2A@ibm.com Link: https://lore.kernel.org/r/173145729034.2747044.453926054000880254.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-13	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Linus Torvalds
	Pull bpf fixes from Daniel Borkmann: - Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6 (Jiawei Ye) - Fix BPF sockmap with kTLS to reject vsock and unix sockets upon kTLS context retrieval (Zijian Zhang) - Fix BPF bits iterator selftest for s390x (Hou Tao) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6 bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx selftests/bpf: Use -4095 as the bad address for bits iterator
2024-11-13	Merge tag 'mm-hotfixes-stable-2024-11-12-16-39' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "10 hotfixes, 7 of which are cc:stable. 7 are MM, 3 are not. All singletons" * tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: swapfile: fix cluster reclaim work crash on rotational devices selftests: hugetlb_dio: fixup check for initial conditions to skip in the start mm/thp: fix deferred split queue not partially_mapped: fix mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases nommu: pass NULL argument to vma_iter_prealloc() ocfs2: fix UBSAN warning in ocfs2_verify_volume() nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint mm: page_alloc: move mlocked flag clearance into free_pages_prepare() mm: count zeromap read and set for swapout and swapin
2024-11-13	tools: gpio: Fix several incorrect format specifiers	Luo Yifan
	Make a minor change to eliminate static checker warnings. The variable lines[] is unsigned, so the correct format specifier should be %u instead of %d. Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241113021458.291252-1-luoyifan@cmss.chinamobile.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-11-13	Merge branch 'kvm-docs-6.13' into HEAD	Paolo Bonzini
	- Drop obsolete references to PPC970 KVM, which was removed 10 years ago. - Fix incorrect references to non-existing ioctls - List registers supported by KVM_GET/SET_ONE_REG on s390 - Use rST internal links - Reorganize the introduction to the API document
2024-11-13	Merge tag 'kvm-x86-misc-6.13' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 misc changes for 6.13 - Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE. - Quirk KVM's misguided behavior of initialized certain feature MSRs to their maximum supported feature set, which can result in KVM creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero value results in the vCPU having invalid state if userspace hides PDCM from the guest, which can lead to save/restore failures. - Fix KVM's handling of non-canonical checks for vCPUs that support LA57 to better follow the "architecture", in quotes because the actual behavior is poorly documented. E.g. most MSR writes and descriptor table loads ignore CR4.LA57 and operate purely on whether the CPU supports LA57. - Bypass the register cache when querying CPL from kvm_sched_out(), as filling the cache from IRQ context is generally unsafe, and harden the cache accessors to try to prevent similar issues from occuring in the future. - Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM over-advertises SPEC_CTRL when trying to support cross-vendor VMs. - Minor cleanups
2024-11-13	Merge tag 'kvm-x86-selftests-6.13' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM selftests changes for 6.13 - Enable XFAM-based features by default for all selftests VMs, which will allow removing the "no AVX" restriction.
2024-11-12	selftests/bpf: Add struct_ops prog private stack tests	Yonghong Song
	Add three tests for struct_ops using private stack. ./test_progs -t struct_ops_private_stack #336/1 struct_ops_private_stack/private_stack:OK #336/2 struct_ops_private_stack/private_stack_fail:OK #336/3 struct_ops_private_stack/private_stack_recur:OK #336 struct_ops_private_stack:OK The following is a snippet of a struct_ops check_member() implementation: u32 moff = __btf_member_bit_offset(t, member) / 8; switch (moff) { case offsetof(struct bpf_testmod_ops3, test_1): prog->aux->priv_stack_requested = true; prog->aux->recursion_detected = test_1_recursion_detected; fallthrough; default: break; } return 0; The first test is with nested two different callback functions where the first prog has more than 512 byte stack size (including subprogs) with private stack enabled. The second test is a negative test where the second prog has more than 512 byte stack size without private stack enabled. The third test is the same callback function recursing itself. At run time, the jit trampoline recursion check kicks in to prevent the recursion. The recursion_detected() callback function is implemented by the bpf_testmod, the following message in dmesg bpf_testmod: oh no, recursing into test_1, recursion_misses 1 demonstrates the callback function is indeed triggered when recursion miss happens. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241112163938.2225528-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12	selftests/bpf: Add tracing prog private stack tests	Yonghong Song
	Some private stack tests are added including: - main prog only with stack size greater than BPF_PSTACK_MIN_SIZE. - main prog only with stack size smaller than BPF_PSTACK_MIN_SIZE. - prog with one subprog having MAX_BPF_STACK stack size and another subprog having non-zero small stack size. - prog with callback function. - prog with exception in main prog or subprog. - prog with async callback without nesting - prog with async callback with possible nesting Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241112163927.2224750-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12	torture: Add --no-affinity parameter to kvm.sh	Paul E. McKenney
	In performance tests, it can be counter-productive to spread torture-test guest OSes across sockets. Plus the experimenter might have ideas about what CPUs individual guest OSes are to run on. This commit therefore adds a --no-affinity parameter to kvm.sh to prevent it from running taskset on its guest OSes. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12	selftests/bpf: update send_signal to lower perf evemts frequency	Eduard Zingerman
	Similar to commit [1] sample perf events less often in test_send_signal_nmi(). This should reduce perf events throttling. [1] 7015843afcaf ("selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT") Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241112110906.3045278-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12	selftests/bpf: allow send_signal test to timeout	Eduard Zingerman
	The following invocation: $ t1=send_signal/send_signal_perf_thread_remote \ t2=send_signal/send_signal_nmi_thread_remote \ ./test_progs -t $t1,$t2 Leads to send_signal_nmi_thread_remote to be stuck on a line 180: /* wait for result */ err = read(pipe_c2p[0], buf, 1); In this test case: - perf event PERF_COUNT_HW_CPU_CYCLES is created for parent process; - BPF program is attached to perf event, and sends a signal to child process when event occurs; - parent program burns some CPU in busy loop and calls read() to get notification from child that it received a signal. The perf event is declared with .sample_period = 1. This forces perf to throttle events, and under some unclear conditions the event does not always occur while parent is in busy loop. After parent enters read() system call CPU cycles event won't be generated for parent anymore. Thus, if perf event had not occurred already the test is stuck. This commit updates the parent to wait for notification with a timeout, doing several iterations of busy loop + read_with_timeout(). Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241112110906.3045278-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12	selftests/bpf: add read_with_timeout() utility function	Eduard Zingerman
	int read_with_timeout(int fd, char *buf, size_t count, long usec) As a regular read(2), but allows to specify a timeout in micro-seconds. Returns -EAGAIN on timeout. Implemented using select(). Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241112110906.3045278-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12	selftests/bpf: watchdog timer for test_progs	Eduard Zingerman
	This commit provides a watchdog timer that sets a limit of how long a single sub-test could run: - if sub-test runs for 10 seconds, the name of the test is printed (currently the name of the test is printed only after it finishes); - if sub-test runs for 120 seconds, the running thread is terminated with SIGSEGV (to trigger crash_handler() and get a stack trace). Specifically: - the timer is armed on each call to run_one_test(); - re-armed at each call to test__start_subtest(); - is stopped when exiting run_one_test(). Default timeout could be overridden using '-w' or '--watchdog-timeout' options. Value 0 can be used to turn the timer off. Here is an example execution: $ ./ssh-exec.sh ./test_progs -w 5 -t \ send_signal/send_signal_perf_thread_remote,send_signal/send_signal_nmi_thread_remote WATCHDOG: test case send_signal/send_signal_nmi_thread_remote executes for 5 seconds, terminating with SIGSEGV Caught signal #11! Stack trace: ./test_progs(crash_handler+0x1f)[0x9049ef] /lib64/libc.so.6(+0x40d00)[0x7f1f1184fd00] /lib64/libc.so.6(read+0x4a)[0x7f1f1191cc4a] ./test_progs[0x720dd3] ./test_progs[0x71ef7a] ./test_progs(test_send_signal+0x1db)[0x71edeb] ./test_progs[0x9066c5] ./test_progs(main+0x5ed)[0x9054ad] /lib64/libc.so.6(+0x2a088)[0x7f1f11839088] /lib64/libc.so.6(__libc_start_main+0x8b)[0x7f1f1183914b] ./test_progs(_start+0x25)[0x527385] #292 send_signal:FAIL test_send_signal_common:PASS:reading pipe 0 nsec test_send_signal_common:PASS:reading pipe error: size 0 0 nsec test_send_signal_common:PASS:incorrect result 0 nsec test_send_signal_common:PASS:pipe_write 0 nsec test_send_signal_common:PASS:setpriority 0 nsec Timer is implemented using timer_{create,start} librt API. Internally librt uses pthreads for SIGEV_THREAD timers, so this change adds a background timer thread to the test process. Because of this a few checks in tests 'bpf_iter' and 'iters' need an update to account for an extra thread. For parallelized scenario the watchdog is also created for each worker fork. If one of the workers gets stuck, it would be terminated by a watchdog. In theory, this might lead to a scenario when all worker threads are exhausted, however this should not be a problem for server_main(), as it would exit with some of the tests not run. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241112110906.3045278-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm fixes from Paolo Bonzini: "x86 and selftests fixes. x86: - When emulating a guest TLB flush for a nested guest, flush vpid01, not vpid02, if L2 is active but VPID is disabled in vmcs12, i.e. if L2 and L1 are sharing VPID '0' (from L1's perspective). - Fix a bug in the SNP initialization flow where KVM would return '0' to userspace instead of -errno on failure. - Move the Intel PT virtualization (i.e. outputting host trace to host buffer and guest trace to guest buffer) behind CONFIG_BROKEN. - Fix memory leak on failure of KVM_SEV_SNP_LAUNCH_START - Fix a bug where KVM fails to inject an interrupt from the IRR after KVM_SET_LAPIC. Selftests: - Increase the timeout for the memslot performance selftest to avoid false failures on arm64 and nested x86 platforms. - Fix a goof in the guest_memfd selftest where a for-loop initialized a bit mask to zero instead of BIT(0). - Disable strict aliasing when building KVM selftests to prevent the compiler from treating things like "u64 " to "uint64_t " cases as undefined behavior, which can lead to nasty, hard to debug failures. - Force -march=x86-64-v2 for KVM x86 selftests if and only if the uarch is supported by the compiler. - Fix broken compilation of kvm selftests after a header sync in tools/" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: Bury Intel PT virtualization (guest/host mode) behind CONFIG_BROKEN KVM: x86: Unconditionally set irr_pending when updating APICv state kvm: svm: Fix gctx page leak on invalid inputs KVM: selftests: use X86_MEMTYPE_WB instead of VMX_BASIC_MEM_TYPE_WB KVM: SVM: Propagate error from snp_guest_req_init() to userspace KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabled KVM: selftests: Don't force -march=x86-64-v2 if it's unsupported KVM: selftests: Disable strict aliasing KVM: selftests: fix unintentional noop test in guest_memfd_test.c KVM: selftests: memslot_perf_test: increase guest sync timeout
2024-11-12	Merge tag 'kvm-s390-next-6.13-1' of ↵	Paolo Bonzini
	https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - second part of the ucontrol selftest - cpumodel sanity check selftest - gen17 cpumodel changes
2024-11-12	selftests: hugetlb_dio: fixup check for initial conditions to skip in the start	Donet Tom
	This test verifies that a hugepage, used as a user buffer for DIO operations, is correctly freed upon unmapping. To test this, we read the count of free hugepages before and after the mmap, DIO, and munmap operations, then check if the free hugepage count is the same. Reading free hugepages before the test was removed by commit 0268d4579901 ('selftests: hugetlb_dio: check for initial conditions to skip at the start'), causing the test to always fail. This patch adds back reading the free hugepages before starting the test. With this patch, the tests are now passing. Test results without this patch: ./tools/testing/selftests/mm/hugetlb_dio TAP version 13 1..4 # No. Free pages before allocation : 0 # No. Free pages after munmap : 100 not ok 1 : Huge pages not freed! # No. Free pages before allocation : 0 # No. Free pages after munmap : 100 not ok 2 : Huge pages not freed! # No. Free pages before allocation : 0 # No. Free pages after munmap : 100 not ok 3 : Huge pages not freed! # No. Free pages before allocation : 0 # No. Free pages after munmap : 100 not ok 4 : Huge pages not freed! # Totals: pass:0 fail:4 xfail:0 xpass:0 skip:0 error:0 Test results with this patch: /tools/testing/selftests/mm/hugetlb_dio TAP version 13 1..4 # No. Free pages before allocation : 100 # No. Free pages after munmap : 100 ok 1 : Huge pages freed successfully ! # No. Free pages before allocation : 100 # No. Free pages after munmap : 100 ok 2 : Huge pages freed successfully ! # No. Free pages before allocation : 100 # No. Free pages after munmap : 100 ok 3 : Huge pages freed successfully ! # No. Free pages before allocation : 100 # No. Free pages after munmap : 100 ok 4 : Huge pages freed successfully ! # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0 Link: https://lkml.kernel.org/r/20241110064903.23626-1-donettom@linux.ibm.com Fixes: 0268d4579901 ("selftests: hugetlb_dio: check for initial conditions to skip in the start") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-12	Merge branch 'iommufd/arm-smmuv3-nested' of iommu/linux into iommufd for-next	Jason Gunthorpe
	Common SMMUv3 patches for the following patches adding nesting, shared branch with the iommu tree. * 'iommufd/arm-smmuv3-nested' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/arm-smmu-v3: Expose the arm_smmu_attach interface iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS ACPI/IORT: Support CANWBS memory access flag ACPICA: IORT: Update for revision E.f vfio: Remove VFIO_TYPE1_NESTING_IOMMU ... Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-12	iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl	Nicolin Chen
	Add a viommu_cache test function to cover vIOMMU invalidations using the updated IOMMU_HWPT_INVALIDATE ioctl, which now allows passing in a vIOMMU via its hwpt_id field. Link: https://patch.msgid.link/r/f317f902041f3d05deaee4ca3fdd8ef4b8297361.1730836308.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-12	iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command	Nicolin Chen
	Similar to IOMMU_TEST_OP_MD_CHECK_IOTLB verifying a mock_domain's iotlb, IOMMU_TEST_OP_DEV_CHECK_CACHE will be used to verify a mock_dev's cache. Link: https://patch.msgid.link/r/cd4082079d75427bd67ed90c3c825e15b5720a5f.1730836308.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-12	iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE	Nicolin Chen
	With a vIOMMU object, use space can flush any IOMMU related cache that can be directed via a vIOMMU object. It is similar to the IOMMU_HWPT_INVALIDATE uAPI, but can cover a wider range than IOTLB, e.g. device/desciprtor cache. Allow hwpt_id of the iommu_hwpt_invalidate structure to carry a viommu_id, and reuse the IOMMU_HWPT_INVALIDATE uAPI for vIOMMU invalidations. Drivers can define different structures for vIOMMU invalidations v.s. HWPT ones. Since both the HWPT-based and vIOMMU-based invalidation pathways check own cache invalidation op, remove the WARN_ON_ONCE in the allocator. Update the uAPI, kdoc, and selftest case accordingly. Link: https://patch.msgid.link/r/b411e2245e303b8a964f39f49453a5dff280968f.1730836308.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-12	iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage	Nicolin Chen
	Add a vdevice_alloc op to the viommu mock_viommu_ops for the coverage of IOMMU_VIOMMU_TYPE_SELFTEST allocations. Then, add a vdevice_alloc TEST_F to cover the IOMMU_VDEVICE_ALLOC ioctl. Link: https://patch.msgid.link/r/4b9607e5b86726c8baa7b89bd48123fb44104a23.1730836308.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-12	iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage	Nicolin Chen
	Add a new iommufd_viommu FIXTURE and setup it up with a vIOMMU object. Any new vIOMMU feature will be added as a TEST_F under that. Link: https://patch.msgid.link/r/abe267c9d004b29cb1712ceba2f378209d4b7e01.1730836219.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-12	kselftest/arm64: Try harder to generate different keys during PAC tests	Mark Brown
	We very intermittently see failures in the single_thread_different_keys PAC test. As noted in the comment in the test the PAC field can be quite narrow so there is a chance of collisions even with different keys with a chance of 5% for 7 bit keys, and the potential for narrower keys. The test tries to avoid this by running repeatedly, but only tries 10 times which even with a 5% chance of collisions isn't enough. Increase the number of times we attempt to look for collisions by a factor of 100, this also affects other tests which are following a similar pattern with running the test repeatedly and either don't care like with pac_instruction_not_nop or potentially have the same issue like exec_sign_all. The PAC tests are very fast, running in a second or two even in emulation, so the 100x increased cost is mildly irritating but not a huge issue. The bulk of the overhead is in the exec_sign_all test which does a fork() and exec() per iteration. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241111-arm64-pac-test-collisions-v1-2-171875f37e44@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-12	kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()	Mark Brown
	The PAC exec_sign_all() test spawns some child processes, creating pipes to be stdin and stdout for the child. It cleans up most of the file descriptors that are created as part of this but neglects to clean up the parent end of the child stdin and stdout. Add the missing close() calls. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241111-arm64-pac-test-collisions-v1-1-171875f37e44@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-12	kselftest/arm64: Corrupt P0 in the irritator when testing SSVE	Mark Brown
	When building for streaming SVE the irritator for SVE skips updates of both P0 and FFR. While FFR is skipped since it might not be present there is no reason to skip corrupting P0 so switch to an instruction valid in streaming mode and move the ifdef. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241107-arm64-fp-stress-irritator-v2-3-c4b9622e36ee@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-12	rcutorture: Add light-weight SRCU scenario	Paul E. McKenney
	This commit adds an rcutorture scenario that tests light-weight SRCU readers. While in the area, it adjusts the size of the TREE10 scenario. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12	kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c	Catalin Marinas
	Compiling the child_cleanup() function results in: gcs-stress.c: In function ‘child_cleanup’: gcs-stress.c:266:75: warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat=] 266 \| ksft_print_msg("%s: Exited due to signal %d\n", \| ~^ \| \| \| int Add the missing child->exit_signal argument. Fixes: 05e6cfff58c4 ("kselftest/arm64: Add a GCS stress test") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-12	kselftest/arm64: Add FPMR coverage to fp-ptrace	Mark Brown
	Add coverage for FPMR to fp-ptrace. FPMR can be available independently of SVE and SME, if SME is supported then FPMR is cleared by entering and exiting streaming mode. As with other registers we generate random values to load into the register, we restrict these to bitfields which are always defined. We also leave bitfields where the valid values are affected by the set of supported FP8 formats zero to reduce complexity, it is unlikely that specific bitfields will be affected by ptrace issues. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241112-arm64-fp-ptrace-fpmr-v2-3-250b57c61254@kernel.org [catalin.marinas@arm.com: use REG_FPMR instead of FPMR] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-12	kselftest/arm64: Expand the set of ZA writes fp-ptrace does	Mark Brown
	Currently our test for implementable ZA writes is written in a bit of a convoluted fashion which excludes all changes where we clear SVCR.SM even though we can actually support that since changing the vector length resets SVCR. Make the logic more direct, enabling us to actually run these cases. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241112-arm64-fp-ptrace-fpmr-v2-2-250b57c61254@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-12	kselftets/arm64: Use flag bits for features in fp-ptrace assembler code	Mark Brown
	The assembler portions of fp-ptrace are passed feature flags by the C code indicating which architectural features are supported. Currently these use an entire register for each flag which is wasteful and gets cumbersome as new flags are added. Switch to using flag bits in a single register to make things easier to maintain. No functional change. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241112-arm64-fp-ptrace-fpmr-v2-1-250b57c61254@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-12	kselftest/arm64: Enable build of PAC tests with LLVM=1	Mark Brown
	Currently we don't build the PAC selftests when building with LLVM=1 since we attempt to test for PAC support in the toolchain before we've set up the build system to point at LLVM in lib.mk, which has to be one of the last things in the Makefile. Since all versions of LLVM supported for use with the kernel have PAC support we can just sidestep the issue by just assuming PAC is there when doing a LLVM=1 build. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241111-arm64-selftest-pac-clang-v1-1-08599ceee418@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-12	kselftest/arm64: Check that SVCR is 0 in signal handlers	Mark Brown
	We don't currently validate that we exit streaming mode and clear ZA when we enter a signal handler. Add simple checks for this in the SSVE and ZA tests. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241106-arm64-fpmr-signal-test-v1-1-31fa34ce58fe@kernel.org [catalin.marinas@arm.com: Use %lx in fprintf() as uint64_t seems to be unsigned long in glibc] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-11	libbpf: Stringify errno in log messages in the remaining code	Mykyta Yatsenko
	Convert numeric error codes into the string representations in log messages in the rest of libbpf source files. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111212919.368971-5-mykyta.yatsenko5@gmail.com
2024-11-11	libbpf: Stringify errno in log messages in btf*.c	Mykyta Yatsenko
	Convert numeric error codes into the string representations in log messages in btf.c and btf_dump.c. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111212919.368971-4-mykyta.yatsenko5@gmail.com
2024-11-11	libbpf: Stringify errno in log messages in libbpf.c	Mykyta Yatsenko
	Convert numeric error codes into the string representations in log messages in libbpf.c. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111212919.368971-3-mykyta.yatsenko5@gmail.com
2024-11-11	libbpf: Introduce errstr() for stringifying errno	Mykyta Yatsenko
	Add function errstr(int err) that allows converting numeric error codes into string representations. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111212919.368971-2-mykyta.yatsenko5@gmail.com
2024-11-11	tools/bpf: Fix the wrong format specifier in bpf_jit_disasm	Luo Yifan
	There is a static checker warning that the %d in format string is mismatched with the corresponding argument type, which could result in incorrect printed data. This patch fixes it. Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111021004.272293-1-luoyifan@cmss.chinamobile.com
2024-11-11	net: ipv4: Cache pmtu for all packet paths if multipath enabled	Vladimir Vdovin
	Check number of paths by fib_info_num_path(), and update_or_create_fnhe() for every path. Problem is that pmtu is cached only for the oif that has received icmp message "need to frag", other oifs will still try to use "default" iface mtu. An example topology showing the problem: \| host1 +---------+ \| dummy0 \| 10.179.20.18/32 mtu9000 +---------+ +-----------+----------------+ +---------+ +---------+ \| ens17f0 \| 10.179.2.141/31 \| ens17f1 \| 10.179.2.13/31 +---------+ +---------+ \| (all here have mtu 9000) \| +------+ +------+ \| ro1 \| 10.179.2.140/31 \| ro2 \| 10.179.2.12/31 +------+ +------+ \| \| ---------+------------+-------------------+------ \| +-----+ \| ro3 \| 10.10.10.10 mtu1500 +-----+ \| ======================================== some networks ======================================== \| +-----+ \| eth0\| 10.10.30.30 mtu9000 +-----+ \| host2 host1 have enabled multipath and sysctl net.ipv4.fib_multipath_hash_policy = 1: default proto static src 10.179.20.18 nexthop via 10.179.2.12 dev ens17f1 weight 1 nexthop via 10.179.2.140 dev ens17f0 weight 1 When host1 tries to do pmtud from 10.179.20.18/32 to host2, host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500. And host1 caches it in nexthop exceptions cache. Problem is that it is cached only for the iface that has received icmp, and there is no way that ro3 will send icmp msg to host1 via another path. Host1 now have this routes to host2: ip r g 10.10.30.30 sport 30000 dport 443 10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0 cache expires 521sec mtu 1500 ip r g 10.10.30.30 sport 30033 dport 443 10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0 cache So when host1 tries again to reach host2 with mtu>1500, if packet flow is lucky enough to be hashed with oif=ens17f1 its ok, if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1, until lucky day when ro3 will send it through another flow to ens17f0. Signed-off-by: Vladimir Vdovin <deliran@verdict.gg> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20241108093427.317942-1-deliran@verdict.gg Signed-off-by: Jakub Kicinski <kuba@kernel.org>