Age | Commit message (Collapse) | Author |
|
Currently, timerlat top displays the timerlat tracer latency results, saving
the intuitive timerlat trace for the developer to analyze.
This patch goes a step forward in the automaton of the scheduling latency
analysis by providing a summary of the root cause of a latency higher than
the passed "stop tracing" parameter if the trace stops.
The output is intuitive enough for non-expert users to have a general idea
of the root cause by looking at each factor's contribution percentage while
keeping the technical detail in the output for more expert users to start
an in dept debug or to correlate a root cause with an existing one.
The terminology is in line with recent industry and academic publications
to facilitate the understanding of both audiences.
Here is one example of tool output:
----------------------------------------- %< -----------------------------------------------------
# taskset -c 0 timerlat -a 40 -c 1-23 -q
Timer Latency
0 00:00:12 | IRQ Timer Latency (us) | Thread Timer Latency (us)
CPU COUNT | cur min avg max | cur min avg max
1 #12322 | 0 0 1 15 | 10 3 9 31
2 #12322 | 3 0 1 12 | 10 3 9 23
3 #12322 | 1 0 1 21 | 8 2 8 34
4 #12322 | 1 0 1 17 | 10 2 11 33
5 #12322 | 0 0 1 12 | 8 3 8 25
6 #12322 | 1 0 1 14 | 16 3 11 35
7 #12322 | 0 0 1 14 | 9 2 8 29
8 #12322 | 1 0 1 22 | 9 3 9 34
9 #12322 | 0 0 1 14 | 8 2 8 24
10 #12322 | 1 0 0 12 | 9 3 8 24
11 #12322 | 0 0 0 15 | 6 2 7 29
12 #12321 | 1 0 0 13 | 5 3 8 23
13 #12319 | 0 0 1 14 | 9 3 9 26
14 #12321 | 1 0 0 13 | 6 2 8 24
15 #12321 | 1 0 1 15 | 12 3 11 27
16 #12318 | 0 0 1 13 | 7 3 10 24
17 #12319 | 0 0 1 13 | 11 3 9 25
18 #12318 | 0 0 0 12 | 8 2 8 20
19 #12319 | 0 0 1 18 | 10 2 9 28
20 #12317 | 0 0 0 20 | 9 3 8 34
21 #12318 | 0 0 0 13 | 8 3 8 28
22 #12319 | 0 0 1 11 | 8 3 10 22
23 #12320 | 28 0 1 28 | 41 3 11 41
rtla timerlat hit stop tracing
## CPU 23 hit stop tracing, analyzing it ##
IRQ handler delay: 27.49 us (65.52 %)
IRQ latency: 28.13 us
Timerlat IRQ duration: 9.59 us (22.85 %)
Blocking thread: 3.79 us (9.03 %)
objtool:49256 3.79 us
Blocking thread stacktrace
-> timerlat_irq
-> __hrtimer_run_queues
-> hrtimer_interrupt
-> __sysvec_apic_timer_interrupt
-> sysvec_apic_timer_interrupt
-> asm_sysvec_apic_timer_interrupt
-> _raw_spin_unlock_irqrestore
-> cgroup_rstat_flush_locked
-> cgroup_rstat_flush_irqsafe
-> mem_cgroup_flush_stats
-> mem_cgroup_wb_stats
-> balance_dirty_pages
-> balance_dirty_pages_ratelimited_flags
-> btrfs_buffered_write
-> btrfs_do_write_iter
-> vfs_write
-> __x64_sys_pwrite64
-> do_syscall_64
-> entry_SYSCALL_64_after_hwframe
------------------------------------------------------------------------
Thread latency: 41.96 us (100%)
The system has exit from idle latency!
Max timerlat IRQ latency from idle: 17.48 us in cpu 4
Saving trace to timerlat_trace.txt
----------------------------------------- >% -----------------------------------------------------
In this case, the major factor was the delay suffered by the IRQ handler
that handles timerlat wakeup: 65.52 %. This can be caused by the
current thread masking interrupts, which can be seen in the blocking
thread stacktrace: the current thread (objtool:49256) disabled interrupts
via raw spin lock operations inside mem cgroup, while doing write
syscall in a btrfs file system.
A simple search for the function name on Google shows that this is
a legit case for disabling the interrupts:
cgroup: Use irqsave in cgroup_rstat_flush_locked()
lore.kernel.org/linux-mm/20220301122143.1521823-2-bigeasy@linutronix.de/
The output also prints other reasons for the latency root cause, such as:
- an IRQ that happened before the IRQ handler that caused delays
- The interference from NMI, IRQ, Softirq, and Threads
The details about how these factors affect the scheduling latency
can be found here:
https://bristot.me/demystifying-the-real-time-linux-latency/
Link: https://lkml.kernel.org/r/3d45f40e630317f51ac6d678e2d96d310e495729.1675179318.git.bristot@kernel.org
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Currently, timerlat displays a summary of the timerlat tracer results
saving the trace if the system hits a stop condition.
While this represented a huge step forward, the root cause was not
that is accessible to non-expert users.
The auto-analysis fulfill this gap by parsing the trace timerlat runs,
printing an intuitive auto-analysis.
Link: https://lkml.kernel.org/r/1ee073822f6a2cbb33da0c817331d0d4045e837f.1675179318.git.bristot@kernel.org
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.
But, from Documentation/trace/ftrace.rst:
Before 4.1, all ftrace tracing control files were within the debugfs
file system, which is typically located at /sys/kernel/debug/tracing.
For backward compatibility, when mounting the debugfs file system,
the tracefs file system will be automatically mounted at:
/sys/kernel/debug/tracing
A few scripts in tools/power still refer to this older debugfs path, so
let's update them to avoid confusion.
Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
benchmarking
The test tool can check that the zerocopy number of completions value is
valid taking into consideration the number of datagram send calls. This can
catch the system into a state where the datagrams are still in the system
(for example in a qdisk, waiting for the network interface to return a
completion notification, etc).
This change adds a retry logic of computing the number of completions up to
a configurable (via CLI) timeout (default: 2 seconds).
Fixes: 79ebc3c26010 ("net/udpgso_bench_tx: options to exercise TX CMSG")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-4-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
"udpgro_bench.sh" invokes udpgso_bench_rx/udpgso_bench_tx programs
subsequently and while doing so, there is a chance that the rx one is not
ready to accept socket connections. This racing bug could fail the test
with at least one of the following:
./udpgso_bench_tx: connect: Connection refused
./udpgso_bench_tx: sendmsg: Connection refused
./udpgso_bench_tx: write: Connection refused
This change addresses this by making udpgro_bench.sh wait for the rx
program to be ready before firing off the tx one - up to a 10s timeout.
Fixes: 3a687bef148d ("selftests: udp gso benchmark")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-3-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Leaving unrecognized arguments buried in the output, can easily hide a
CLI/script typo. Avoid this by exiting when wrong arguments are provided to
the udpgso_bench test programs.
Fixes: 3a687bef148d ("selftests: udp gso benchmark")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-2-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This change fixes the following compiler warning:
/usr/include/x86_64-linux-gnu/bits/error.h:40:5: warning: ‘gso_size’ may
be used uninitialized [-Wmaybe-uninitialized]
40 | __error_noreturn (__status, __errnum, __format,
__va_arg_pack ());
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
udpgso_bench_rx.c: In function ‘main’:
udpgso_bench_rx.c:253:23: note: ‘gso_size’ was declared here
253 | int ret, len, gso_size, budget = 256;
Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-1-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The linux/net_tstamp.h is included more than once, thus clean it up.
Signed-off-by: Ye Xingchen <ye.xingchen@zte.com.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/202301311440516312161@zte.com.cn
|
|
We only need to consume TX completion instead of refilling 'fill' ring.
It's currently not an issue because we never RX more than 8 packets.
Fixes: e2a46d54d7a1 ("selftests/bpf: Verify xdp_metadata xdp->af_xdp path")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230201233640.367646-1-sdf@google.com
|
|
A statically linked executable can have a .plt due to IFUNCs, in which
case .symtab is used not .dynsym. Check the section header link to see
if that is the case, and then use symtab instead.
Example:
Before:
$ cat tstifunc.c
#include <stdio.h>
void thing1(void)
{
printf("thing1\n");
}
void thing2(void)
{
printf("thing2\n");
}
typedef void (*thing_fn_t)(void);
thing_fn_t thing_ifunc(void)
{
int x;
if (x & 1)
return thing2;
return thing1;
}
void thing(void) __attribute__ ((ifunc ("thing_ifunc")));
int main()
{
thing();
return 0;
}
$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -static -Wall -Wextra -Wno-uninitialized -o tstifuncstatic tstifunc.c
$ readelf -SW tstifuncstatic | grep 'Name\|plt\|dyn'
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 4] .rela.plt RELA 00000000004002e8 0002e8 000258 18 AI 29 20 8
[ 6] .plt PROGBITS 0000000000401020 001020 000190 00 AX 0 0 16
[20] .got.plt PROGBITS 00000000004c5000 0c4000 0000e0 08 WA 0 0 8
$ perf record -e intel_pt//u --filter 'filter main @ ./tstifuncstatic' ./tstifuncstatic
thing1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.008 MB perf.data ]
$ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso
15786.690189535: tr strt 0 [unknown] => 4017cd main+0x0
15786.690189535: tr end call 4017d5 main+0x8 => 401170 [unknown]
15786.690197660: tr strt 0 [unknown] => 4017da main+0xd
15786.690197660: tr end return 4017e0 main+0x13 => 401c1a __libc_start_call_main+0x6a
After:
$ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso
15786.690189535: tr strt 0 [unknown] => 4017cd main+0x0
15786.690189535: tr end call 4017d5 main+0x8 => 401170 thing_ifunc@plt+0x0
15786.690197660: tr strt 0 [unknown] => 4017da main+0xd
15786.690197660: tr end return 4017e0 main+0x13 => 401c1a __libc_start_call_main+0x6a
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230131131625.6964-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
A static executable can have a .plt due to the presence of IFUNCs. In
that case the .plt does not have a header. Check for whether there is a
header by comparing the number of entries to the number of relocation
entries.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230131131625.6964-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For x86_64, the GNU linker is putting IFUNC information in the relocation
addend, so use it to try to find a symbol for plt entries that refer to
IFUNCs.
Example:
Before:
$ cat tstpltlib.c
void fn1(void) {}
void fn2(void) {}
void fn3(void) {}
void fn4(void) {}
$ cat tstpltifunc.c
#include <stdio.h>
void thing1(void)
{
printf("thing1\n");
}
void thing2(void)
{
printf("thing2\n");
}
typedef void (*thing_fn_t)(void);
thing_fn_t thing_ifunc(void)
{
int x;
if (x & 1)
return thing2;
return thing1;
}
void thing(void) __attribute__ ((ifunc ("thing_ifunc")));
void fn1(void);
void fn2(void);
void fn3(void);
void fn4(void);
int main()
{
fn4();
fn1();
thing();
fn2();
fn3();
return 0;
}
$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -Wall -Wextra -shared -o libtstpltlib.so tstpltlib.c
$ gcc -Wall -Wextra -Wno-uninitialized -o tstpltifunc tstpltifunc.c -L . -ltstpltlib -Wl,-rpath="$(pwd)"
$ readelf -rW tstpltifunc | grep -A99 plt
Relocation section '.rela.plt' at offset 0x738 contains 8 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000003f98 0000000300000007 R_X86_64_JUMP_SLOT 0000000000000000 puts@GLIBC_2.2.5 + 0
0000000000003fa8 0000000400000007 R_X86_64_JUMP_SLOT 0000000000000000 __stack_chk_fail@GLIBC_2.4 + 0
0000000000003fb0 0000000500000007 R_X86_64_JUMP_SLOT 0000000000000000 fn1 + 0
0000000000003fb8 0000000600000007 R_X86_64_JUMP_SLOT 0000000000000000 fn3 + 0
0000000000003fc0 0000000800000007 R_X86_64_JUMP_SLOT 0000000000000000 fn4 + 0
0000000000003fc8 0000000900000007 R_X86_64_JUMP_SLOT 0000000000000000 fn2 + 0
0000000000003fd0 0000000b00000007 R_X86_64_JUMP_SLOT 0000000000000000 getrandom@GLIBC_2.25 + 0
0000000000003fa0 0000000000000025 R_X86_64_IRELATIVE 125d
$ perf record -e intel_pt//u --filter 'filter main @ ./tstpltifunc' ./tstpltifunc
thing2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data ]
$ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso
21860.073683659: tr strt 0 [unknown] => 561e212c42be main+0x0
21860.073683659: tr end call 561e212c42c6 main+0x8 => 561e212c4110 fn4@plt+0x0
21860.073683661: tr strt 0 [unknown] => 561e212c42cb main+0xd
21860.073683661: tr end call 561e212c42cb main+0xd => 561e212c40f0 fn1@plt+0x0
21860.073683661: tr strt 0 [unknown] => 561e212c42d0 main+0x12
21860.073683661: tr end call 561e212c42d0 main+0x12 => 561e212c40d0 offset_0x10d0@plt+0x0
21860.073698451: tr strt 0 [unknown] => 561e212c42d5 main+0x17
21860.073698451: tr end call 561e212c42d5 main+0x17 => 561e212c4120 fn2@plt+0x0
21860.073698451: tr strt 0 [unknown] => 561e212c42da main+0x1c
21860.073698451: tr end call 561e212c42da main+0x1c => 561e212c4100 fn3@plt+0x0
21860.073698452: tr strt 0 [unknown] => 561e212c42df main+0x21
21860.073698452: tr end return 561e212c42e5 main+0x27 => 7fb51cc29d90 __libc_start_call_main+0x80
After:
$ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso
21860.073683659: tr strt 0 [unknown] => 561e212c42be main+0x0
21860.073683659: tr end call 561e212c42c6 main+0x8 => 561e212c4110 fn4@plt+0x0
21860.073683661: tr strt 0 [unknown] => 561e212c42cb main+0xd
21860.073683661: tr end call 561e212c42cb main+0xd => 561e212c40f0 fn1@plt+0x0
21860.073683661: tr strt 0 [unknown] => 561e212c42d0 main+0x12
21860.073683661: tr end call 561e212c42d0 main+0x12 => 561e212c40d0 thing_ifunc@plt+0x0
21860.073698451: tr strt 0 [unknown] => 561e212c42d5 main+0x17
21860.073698451: tr end call 561e212c42d5 main+0x17 => 561e212c4120 fn2@plt+0x0
21860.073698451: tr strt 0 [unknown] => 561e212c42da main+0x1c
21860.073698451: tr end call 561e212c42da main+0x1c => 561e212c4100 fn3@plt+0x0
21860.073698452: tr strt 0 [unknown] => 561e212c42df main+0x21
21860.073698452: tr end return 561e212c42e5 main+0x27 => 7fb51cc29d90 __libc_start_call_main+0x80
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230131131625.6964-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To assist with synthesizing plt symbols for IFUNCs, record whether a
symbol is an alias of an IFUNC symbol.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230131131625.6964-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For x86, with the addition of IFUNCs, relocation information becomes
disordered with respect to plt. Correct that by sorting the relocations by
offset.
Example:
Before:
$ cat tstpltlib.c
void fn1(void) {}
void fn2(void) {}
void fn3(void) {}
void fn4(void) {}
$ cat tstpltifunc.c
#include <stdio.h>
void thing1(void)
{
printf("thing1\n");
}
void thing2(void)
{
printf("thing2\n");
}
typedef void (*thing_fn_t)(void);
thing_fn_t thing_ifunc(void)
{
int x;
if (x & 1)
return thing2;
return thing1;
}
void thing(void) __attribute__ ((ifunc ("thing_ifunc")));
void fn1(void);
void fn2(void);
void fn3(void);
void fn4(void);
int main()
{
fn4();
fn1();
thing();
fn2();
fn3();
return 0;
}
$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -Wall -Wextra -shared -o libtstpltlib.so tstpltlib.c
$ gcc -Wall -Wextra -Wno-uninitialized -o tstpltifunc tstpltifunc.c -L . -ltstpltlib -Wl,-rpath="$(pwd)"
$ readelf -rW tstpltifunc | grep -A99 plt
Relocation section '.rela.plt' at offset 0x738 contains 8 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000003f98 0000000300000007 R_X86_64_JUMP_SLOT 0000000000000000 puts@GLIBC_2.2.5 + 0
0000000000003fa8 0000000400000007 R_X86_64_JUMP_SLOT 0000000000000000 __stack_chk_fail@GLIBC_2.4 + 0
0000000000003fb0 0000000500000007 R_X86_64_JUMP_SLOT 0000000000000000 fn1 + 0
0000000000003fb8 0000000600000007 R_X86_64_JUMP_SLOT 0000000000000000 fn3 + 0
0000000000003fc0 0000000800000007 R_X86_64_JUMP_SLOT 0000000000000000 fn4 + 0
0000000000003fc8 0000000900000007 R_X86_64_JUMP_SLOT 0000000000000000 fn2 + 0
0000000000003fd0 0000000b00000007 R_X86_64_JUMP_SLOT 0000000000000000 getrandom@GLIBC_2.25 + 0
0000000000003fa0 0000000000000025 R_X86_64_IRELATIVE 125d
$ perf record -e intel_pt//u --filter 'filter main @ ./tstpltifunc' ./tstpltifunc
thing2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.029 MB perf.data ]
$ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso
20417.302513948: tr strt 0 [unknown] => 5629a74892be main+0x0
20417.302513948: tr end call 5629a74892c6 main+0x8 => 5629a7489110 fn2@plt+0x0
20417.302513949: tr strt 0 [unknown] => 5629a74892cb main+0xd
20417.302513949: tr end call 5629a74892cb main+0xd => 5629a74890f0 fn3@plt+0x0
20417.302513950: tr strt 0 [unknown] => 5629a74892d0 main+0x12
20417.302513950: tr end call 5629a74892d0 main+0x12 => 5629a74890d0 __stack_chk_fail@plt+0x0
20417.302528114: tr strt 0 [unknown] => 5629a74892d5 main+0x17
20417.302528114: tr end call 5629a74892d5 main+0x17 => 5629a7489120 getrandom@plt+0x0
20417.302528115: tr strt 0 [unknown] => 5629a74892da main+0x1c
20417.302528115: tr end call 5629a74892da main+0x1c => 5629a7489100 fn4@plt+0x0
20417.302528115: tr strt 0 [unknown] => 5629a74892df main+0x21
20417.302528115: tr end return 5629a74892e5 main+0x27 => 7ff14da29d90 __libc_start_call_main+0x80
After:
$ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso
20417.302513948: tr strt 0 [unknown] => 5629a74892be main+0x0
20417.302513948: tr end call 5629a74892c6 main+0x8 => 5629a7489110 fn4@plt+0x0
20417.302513949: tr strt 0 [unknown] => 5629a74892cb main+0xd
20417.302513949: tr end call 5629a74892cb main+0xd => 5629a74890f0 fn1@plt+0x0
20417.302513950: tr strt 0 [unknown] => 5629a74892d0 main+0x12
20417.302513950: tr end call 5629a74892d0 main+0x12 => 5629a74890d0 offset_0x10d0@plt+0x0
20417.302528114: tr strt 0 [unknown] => 5629a74892d5 main+0x17
20417.302528114: tr end call 5629a74892d5 main+0x17 => 5629a7489120 fn2@plt+0x0
20417.302528115: tr strt 0 [unknown] => 5629a74892da main+0x1c
20417.302528115: tr end call 5629a74892da main+0x1c => 5629a7489100 fn3@plt+0x0
20417.302528115: tr strt 0 [unknown] => 5629a74892df main+0x21
20417.302528115: tr end return 5629a74892e5 main+0x27 => 7ff14da29d90 __libc_start_call_main+0x80
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230131131625.6964-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The section .plt.sec was originally added for MPX and was first called
.plt.bnd. While MPX has been deprecated, .plt.sec is now also used for
IBT. On x86_64, IBT may be enabled by default, but can be switched off
using gcc option -fcf-protection=none, or switched on by -z ibt or -z
ibtplt. On 32-bit, option -z ibt or -z ibtplt will enable IBT.
With .plt.sec, calls are made into .plt.sec instead of .plt, so it makes
more sense to put the symbols there instead of .plt. A notable
difference is that .plt.sec does not have a header entry.
For x86, when synthesizing symbols for plt, use offset and entry size of
.plt.sec instead of .plt when there is a .plt.sec section.
Example on Ubuntu 22.04 gcc 11.3:
Before:
$ cat tstpltlib.c
void fn1(void) {}
void fn2(void) {}
void fn3(void) {}
void fn4(void) {}
$ cat tstplt.c
void fn1(void);
void fn2(void);
void fn3(void);
void fn4(void);
int main()
{
fn4();
fn1();
fn2();
fn3();
return 0;
}
$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -Wall -Wextra -shared -o libtstpltlib.so tstpltlib.c
$ gcc -Wall -Wextra -z ibt -o tstplt tstplt.c -L . -ltstpltlib -Wl,-rpath=$(pwd)
$ readelf -SW tstplt | grep 'plt\|Name'
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[11] .rela.plt RELA 0000000000000698 000698 000060 18 AI 6 24 8
[13] .plt PROGBITS 0000000000001020 001020 000050 10 AX 0 0 16
[14] .plt.got PROGBITS 0000000000001070 001070 000010 10 AX 0 0 16
[15] .plt.sec PROGBITS 0000000000001080 001080 000040 10 AX 0 0 16
$ perf record -e intel_pt//u --filter 'filter main @ ./tstplt' ./tstplt
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.015 MB perf.data ]
$ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso
38970.522546686: tr strt 0 [unknown] => 55fc222a81a9 main+0x0
38970.522546686: tr end call 55fc222a81b1 main+0x8 => 55fc222a80a0 [unknown]
38970.522546687: tr strt 0 [unknown] => 55fc222a81b6 main+0xd
38970.522546687: tr end call 55fc222a81b6 main+0xd => 55fc222a8080 [unknown]
38970.522546688: tr strt 0 [unknown] => 55fc222a81bb main+0x12
38970.522546688: tr end call 55fc222a81bb main+0x12 => 55fc222a80b0 [unknown]
38970.522546688: tr strt 0 [unknown] => 55fc222a81c0 main+0x17
38970.522546688: tr end call 55fc222a81c0 main+0x17 => 55fc222a8090 [unknown]
38970.522546689: tr strt 0 [unknown] => 55fc222a81c5 main+0x1c
38970.522546894: tr end return 55fc222a81cb main+0x22 => 7f3a4dc29d90 __libc_start_call_main+0x80
After:
$ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso
38970.522546686: tr strt 0 [unknown] => 55fc222a81a9 main+0x0
38970.522546686: tr end call 55fc222a81b1 main+0x8 => 55fc222a80a0 fn4@plt+0x0
38970.522546687: tr strt 0 [unknown] => 55fc222a81b6 main+0xd
38970.522546687: tr end call 55fc222a81b6 main+0xd => 55fc222a8080 fn1@plt+0x0
38970.522546688: tr strt 0 [unknown] => 55fc222a81bb main+0x12
38970.522546688: tr end call 55fc222a81bb main+0x12 => 55fc222a80b0 fn2@plt+0x0
38970.522546688: tr strt 0 [unknown] => 55fc222a81c0 main+0x17
38970.522546688: tr end call 55fc222a81c0 main+0x17 => 55fc222a8090 fn3@plt+0x0
38970.522546689: tr strt 0 [unknown] => 55fc222a81c5 main+0x1c
38970.522546894: tr end return 55fc222a81cb main+0x22 => 7f3a4dc29d90 __libc_start_call_main+0x80
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230131131625.6964-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In 32-bit executables the .plt entry size can be set to 4 when it is really
16. In fact the only sizes used for x86 (32 or 64 bit) are 8 or 16, so
check for those and, if not, use the alignment to choose which it is.
Example on Ubuntu 22.04 gcc 11.3:
Before:
$ cat tstpltlib.c
void fn1(void) {}
void fn2(void) {}
void fn3(void) {}
void fn4(void) {}
$ cat tstplt.c
void fn1(void);
void fn2(void);
void fn3(void);
void fn4(void);
int main()
{
fn4();
fn1();
fn2();
fn3();
return 0;
}
$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -m32 -Wall -Wextra -shared -o libtstpltlib32.so tstpltlib.c
$ gcc -m32 -Wall -Wextra -o tstplt32 tstplt.c -L . -ltstpltlib32 -Wl,-rpath=$(pwd)
$ perf record -e intel_pt//u --filter 'filter main @ ./tstplt32' ./tstplt32
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data ]
$ readelf -SW tstplt32 | grep 'plt\|Name'
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[10] .rel.plt REL 0000041c 00041c 000028 08 AI 5 22 4
[12] .plt PROGBITS 00001030 001030 000060 04 AX 0 0 16 <- ES is 0x04, should be 0x10
[13] .plt.got PROGBITS 00001090 001090 000008 08 AX 0 0 8
$ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso
17894.383903029: tr strt 0 [unknown] => 565b81cd main+0x0
17894.383903029: tr end call 565b81d4 main+0x7 => 565b80d0 __x86.get_pc_thunk.bx+0x0
17894.383903031: tr strt 0 [unknown] => 565b81d9 main+0xc
17894.383903031: tr end call 565b81df main+0x12 => 565b8070 [unknown]
17894.383903032: tr strt 0 [unknown] => 565b81e4 main+0x17
17894.383903032: tr end call 565b81e4 main+0x17 => 565b8050 [unknown]
17894.383903033: tr strt 0 [unknown] => 565b81e9 main+0x1c
17894.383903033: tr end call 565b81e9 main+0x1c => 565b8080 [unknown]
17894.383903033: tr strt 0 [unknown] => 565b81ee main+0x21
17894.383903033: tr end call 565b81ee main+0x21 => 565b8060 [unknown]
17894.383903237: tr strt 0 [unknown] => 565b81f3 main+0x26
17894.383903237: tr end return 565b81fc main+0x2f => f7c21519 [unknown]
After:
$ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso
17894.383903029: tr strt 0 [unknown] => 565b81cd main+0x0
17894.383903029: tr end call 565b81d4 main+0x7 => 565b80d0 __x86.get_pc_thunk.bx+0x0
17894.383903031: tr strt 0 [unknown] => 565b81d9 main+0xc
17894.383903031: tr end call 565b81df main+0x12 => 565b8070 fn4@plt+0x0
17894.383903032: tr strt 0 [unknown] => 565b81e4 main+0x17
17894.383903032: tr end call 565b81e4 main+0x17 => 565b8050 fn1@plt+0x0
17894.383903033: tr strt 0 [unknown] => 565b81e9 main+0x1c
17894.383903033: tr end call 565b81e9 main+0x1c => 565b8080 fn2@plt+0x0
17894.383903033: tr strt 0 [unknown] => 565b81ee main+0x21
17894.383903033: tr end call 565b81ee main+0x21 => 565b8060 fn3@plt+0x0
17894.383903237: tr strt 0 [unknown] => 565b81f3 main+0x26
17894.383903237: tr end return 565b81fc main+0x2f => f7c21519 [unknown]
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230131131625.6964-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Test “Use vfs_getname probe to get syscall args filenames” fails in
environment with missing libtraceevent support as below:
82: Use vfs_getname probe to get syscall args filenames :
--- start ---
test child forked, pid 304726
Recording open file:
event syntax error: 'probe:vfs_getname*'
\___ unsupported tracepoint
libtraceevent is necessary for tracepoint support
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
test child finished with -1
---- end ----
Use vfs_getname probe to get syscall args filenames: FAILED!
The environment has debuginfo but is missing the libtraceevent devel.
Hence perf is compiled without libtraceevent support. The test tries to
add probe “probe:vfs_getname” and then uses it with “perf record”. This
fails at function “parse_events_add_tracepoint" due to missing
libtraceevent.
Similarly "probe libc's inet_pton & backtrace it with ping" test slso
fails with same reason.
Add a function in 'perf test shell' library to check if perf record with
—dry-run reports any error on missing support for libtraceevent. Update
both the tests to use this new function “skip_no_probe_record_support”
before proceeding With using probe point via perf builtin record.
With the change,
82: Use vfs_getname probe to get syscall args filenames :
--- start ---
test child forked, pid 305014
Recording open file:
libtraceevent is necessary for tracepoint support
test child finished with -2
---- end ----
Use vfs_getname probe to get syscall args filenames: Skip
81: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 305036
libtraceevent is necessary for tracepoint support
test child finished with -2
---- end ----
probe libc's inet_pton & backtrace it with ping: Skip
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Disha Goel <disgoel@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kjain@linux.ibm.com,
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/r/20230201180421.59640-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
record+probe_libc_inet_pton test
The "probe libc's inet_pton & backtrace it with ping" test installs a
uprobe and uses perf record/script to check the backtrace. Currently
even if the "perf record" fails, the test reports success. Logs below:
# ./perf test -v "probe libc's inet_pton & backtrace it with ping"
81: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 304211
failed to open /tmp/perf.data.Btf: No such file or directory
test child finished with 0
---- end ----
probe libc's inet_pton & backtrace it with ping: Ok
Fix this by adding check for presence of perf.data file
before proceeding with "perf script".
With the patch changes, test reports fail correctly.
# ./perf test -v "probe libc's inet_pton & backtrace it with ping"
81: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 304358
FAIL: perf record failed to create "/tmp/perf.data.Uoi"
test child finished with -1
---- end ----
probe libc's inet_pton & backtrace it with ping: FAILED!
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Disha Goel <disgoel@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/r/20230201180421.59640-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The test_pipe() function will check perf report and perf inject with
pipe input.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230131023350.1903992-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We should not call lseek(2) for pipes as it won't work. And we already
in the proper place to read the data for AUXTRACE. Add the comment like
in the PERF_RECORD_HEADER_TRACING_DATA.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230131023350.1903992-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When it processes AUXTRACE_INFO, it calls to auxtrace_queue_data() to
collect AUXTRACE data first. That won't work with pipe since it needs
lseek() to read the scattered aux data.
$ perf record -o- -e intel_pt// true | perf report -i- --itrace=i100
# To display the perf.data header info, please use --header/--header-only options.
#
0x4118 [0xa0]: failed to process type: 70
Error:
failed to process sample
For the pipe mode, it can handle the aux data as it gets. But there's
no guarantee it can get the aux data in time. So the following warning
will be shown at the beginning:
WARNING: Intel PT with pipe mode is not recommended.
The output cannot relied upon. In particular,
time stamps and the order of events may be incorrect.
Fixes: dbd134322e74f19d ("perf intel-pt: Add support for decoding AUX area samples")
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230131023350.1903992-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The ifname char pointer is taken directly from the command line
as input and the string is copied directly into struct ifreq
via strcpy. This makes it easy to corrupt other members of ifreq
and generally do stack overflows.
Most often the ioctl will fail with:
./xdp_hw_metadata: ioctl(SIOCETHTOOL): Bad address
As people will likely copy-paste code for getting NIC queue
channels (rxq_num) and enabling HW timestamping (hwtstamp_ioctl)
lets make this code a bit more secure by using strncpy.
Fixes: 297a3f124155 ("selftests/bpf: Simple program to dump XDP RX metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/167527272543.937063.16993147790832546209.stgit@firesoul
|
|
The glibc error reporting function error():
void error(int status, int errnum, const char *format, ...);
The status argument should be a positive value between 0-255 as it
is passed over to the exit(3) function as the value as the shell exit
status. The least significant byte of status (i.e., status & 0xFF) is
returned to the shell parent.
Fix this by using 1 instead of -1. As 1 corresponds to C standard
constant EXIT_FAILURE.
Fixes: 297a3f124155 ("selftests/bpf: Simple program to dump XDP RX metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/167527272038.937063.9137108142012298120.stgit@firesoul
|
|
Using xdp_hw_metadata I experince Segmentation fault after seeing
"detaching bpf program....".
On my system the segfault happened when accessing bpf_obj->skeleton
in xdp_hw_metadata__destroy(bpf_obj) call. That doesn't make any sense
as this memory have not been freed by program at this point in time.
Prior to calling xdp_hw_metadata__destroy(bpf_obj) the function
close_xsk() is called for each RX-queue xsk. The real bug lays
in close_xsk() that unmap via munmap() the wrong memory pointer.
The call xsk_umem__delete(xsk->umem) will free xsk->umem, thus
the call to munmap(xsk->umem, UMEM_SIZE) will have unpredictable
behavior. And man page explain subsequent references to these
pages will generate SIGSEGV.
Unmapping xsk->umem_area instead removes the segfault.
Fixes: 297a3f124155 ("selftests/bpf: Simple program to dump XDP RX metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/167527271533.937063.5717065138099679142.stgit@firesoul
|
|
The AF_XDP userspace part of xdp_hw_metadata see non-zero as a signal of
the availability of rx_timestamp and rx_hash in data_meta area. The
kernel-side BPF-prog code doesn't initialize these members when kernel
returns an error e.g. -EOPNOTSUPP. This memory area is not guaranteed to
be zeroed, and can contain garbage/previous values, which will be read
and interpreted by AF_XDP userspace side.
Tested this on different drivers. The experiences are that for most
packets they will have zeroed this data_meta area, but occasionally it
will contain garbage data.
Example of failure tested on ixgbe:
poll: 1 (0)
xsk_ring_cons__peek: 1
0x18ec788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
rx_hash: 3697961069
rx_timestamp: 9024981991734834796 (sec:9024981991.7348)
0x18ec788: complete idx=8 addr=8000
Converting to date:
date -d @9024981991
2255-12-28T20:26:31 CET
I choose a simple fix in this patch. When kfunc fails or isn't supported
assign zero to the corresponding struct meta value.
It's up to the individual BPF-programmer to do something smarter e.g.
that fits their use-case, like getting a software timestamp and marking
a flag that gives the type of timestamp.
Fixes: 297a3f124155 ("selftests/bpf: Simple program to dump XDP RX metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/167527271027.937063.5177725618616476592.stgit@firesoul
|
|
The function close_xsk() unmap via munmap() the wrong memory pointer.
The call xsk_umem__delete(xsk->umem) have already freed xsk->umem.
Thus the call to munmap(xsk->umem, UMEM_SIZE) will have unpredictable
behavior that can lead to Segmentation fault elsewhere, as man page
explain subsequent references to these pages will generate SIGSEGV.
Fixes: e2a46d54d7a1 ("selftests/bpf: Verify xdp_metadata xdp->af_xdp path")
Reported-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/167527517464.938135.13750760520577765269.stgit@firesoul
|
|
kfuncs are allowed to be static, or not use one or more of their
arguments. For example, bpf_xdp_metadata_rx_hash() in net/core/xdp.c is
meant to be implemented by drivers, with the default implementation just
returning -EOPNOTSUPP. As described in [0], such kfuncs can have their
arguments elided, which can cause BTF encoding to be skipped. The new
__bpf_kfunc macro should address this, and this patch adds a selftest
which verifies that a static kfunc with at least one unused argument can
still be encoded and invoked by a BPF program.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230201173016.342758-5-void@manifault.com
|
|
Now that we have the __bpf_kfunc tag, we should use add it to all
existing kfuncs to ensure that they'll never be elided in LTO builds.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230201173016.342758-4-void@manifault.com
|
|
In copy_bytes(), it reads the data from the (input) fd and writes it to
the output file. But it does with the read(2) unconditionally which
caused a problem of mixing buffered vs unbuffered I/O together.
You can see the problem when using pipes.
$ perf record -e intel_pt// -o- true | perf inject -b > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
0x45c0 [0x30]: failed to process type: 71
It should use perf_data__read() to honor the 'use_stdio' setting.
Fixes: 601366678c93618f ("perf data: Allow to use stdio functions for pipe mode")
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230131023350.1903992-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Pull virtio fixes from Michael Tsirkin:
"Just small bugfixes all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa: ifcvf: Do proper cleanup if IFCVF init fails
vhost-scsi: unbreak any layout for response
tools/virtio: fix the vringh test for virtio ring changes
vhost/net: Clear the pending messages when the backend is removed
|
|
The current signal handling tests for SME do not account for the fact that
unlike SVE all SME vector lengths are optional so we can't guarantee that
we will encounter the minimum possible VL, they will hang enumerating VLs
on such systems. Abort enumeration when we find the lowest VL in the newly
added ssve_za_regs test.
Fixes: bc69da5ff087 ("kselftest/arm64: Verify simultaneous SSVE and ZA context generation")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230131-arm64-kselftest-sig-sme-no-128-v1-2-d47c13dc8e1e@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The current signal handling tests for SME do not account for the fact that
unlike SVE all SME vector lengths are optional so we can't guarantee that
we will encounter the minimum possible VL, they will hang enumerating VLs
on such systems. Abort enumeration when we find the lowest VL.
Fixes: 4963aeb35a9e ("kselftest/arm64: signal: Add SME signal handling tests")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230131-arm64-kselftest-sig-sme-no-128-v1-1-d47c13dc8e1e@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
During early development a dependedncy was added on having FA64
available so we could use the full FPSIMD register set in the signal
handler. Subsequently the ABI was finialised so the handler is run with
streaming mode disabled meaning this is redundant but the dependency was
never removed, do so now.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230131-arm64-kselfetest-ssve-fa64-v1-1-f418efcc2b60@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The existing users of these helpers have been converted to iproute2 dcb.
Drop the helpers.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Set up default port priority through the iproute2 dcb tool, which is easier
to understand and manage.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Set up DSCP prioritization through the iproute2 dcb tool, which is easier
to understand and manage.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Set up DSCP prioritization through the iproute2 dcb tool, which is easier
to understand and manage.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The scripts require Python 3 and some distros are dropping
Python 2 support.
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The CLI script tries to validate jsonschema by default.
It's seems better to validate too many times than too few.
However, when copying the scripts to random servers having
to install jsonschema is tedious. Load jsonschema via
importlib, and let the user opt out.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When I wrote the first version of the Python code I was quite
excited that we can generate class methods directly from the
spec. Unfortunately we need to use valid identifiers for method
names (specifically no dashes are allowed). Don't reuse those
names on the CLI, it's much more natural to use the operation
names exactly as listed in the spec.
Instead of:
./cli --do rings_get
use:
./cli --do rings-get
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
One of my favorite features of the Netlink specs is that they
make decoding structured extack a ton easier.
Implement pretty printing bad attribute names in YNL.
For example it will now say:
'bad-attr': '.header.flags'
rather than the useless:
'bad-attr-offs': 32
Proof:
$ ./cli.py --spec ethtool.yaml --do rings_get \
--json '{"header":{"dev-index":1, "flags":4}}'
Netlink error: Invalid argument
nl_len = 68 (52) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'reserved bit set',
'bad-attr': '.header.flags'}
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ethtool uses mutli-attr, add the support to YNL.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support families which use different IDs for messages
to and from the kernel.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ethtool needs support for handful of extra types.
It doesn't have the definitions section yet.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adapt the common object hierarchy in code gen and CLI.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There's a lot of copy and pasting going on between the "cli"
and code gen when it comes to representing the parsed spec.
Create a library which both can use.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the CLI code out of samples/ and the library part
of it into tools/net/ynl/lib/. This way we can start
sharing some code with the code gen.
Initially I thought that code gen is too C-specific to
share anything but basic stuff like calculating values
for enums can easily be shared.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
An earlier fix tried to address generated code jumping around
one code-gen run to another. Turns out dict()s are already
ordered since Python 3.7, the problem is that we iterate over
operation modes using a set(). Sets are unordered in Python.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Calculate average value in osnoise-hist summary with two-digit
precision to avoid displaying too optimitic results.
Link: https://lkml.kernel.org/r/20230103103400.275566-3-br015@umbiko.net
Signed-off-by: Andreas Ziegler <br015@umbiko.net>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Sampled durations must be weighted by observed quantity, to arrive at a correct
average duration value.
Perform calculation of total duration by summing (duration * count).
Link: https://lkml.kernel.org/r/20230103103400.275566-2-br015@umbiko.net
Fixes: 829a6c0b5698 ("rtla/osnoise: Add the hist mode")
Signed-off-by: Andreas Ziegler <br015@umbiko.net>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|