Age | Commit message (Collapse) | Author |
|
This patch supports jumping from tui total cycles view to symbol source
view.
For example,
perf record -b ./div
perf report --total-cycles
In total cycles view, we can select one entry and press 'a' or press
ENTER key to jump to symbol source view.
This patch also sets sort_order to NULL in cmd_report() which will use
the default branch sort order. The percent value in new annotate view
will be consistent with the percent in annotate view switched from perf
report (we observed the original percent gap with previous patches).
v2:
---
Fix the 'make NO_SLANG=1' error. (set __maybe_unused to
annotation_opts in block_hists_tui_browse()).
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191118140849.20714-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It would be nice if we could jump to the assembler/source view (like the
normal perf report) from total cycles view.
This patch moves the block_hists_tui_browse from block-info.c to
ui/browsers/hists.c in order to reuse some browser codes (i.e
do_annotate) for implementing new annotation view.
v2:
---
Fix the 'make NO_SLANG=1' error. (Change 'int block_hists_tui_browse()'
to 'static inline int block_hists_tui_browse()')
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191118140849.20714-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Avoid termination of trace loading in case the last record in the
decompressed buffer partly resides in the following mmaped
PERF_RECORD_COMPRESSED record.
In this case NULL value returned by fetch_mmaped_event() means to
proceed to the next mmaped record then decompress it and load compressed
events.
The issue can be reproduced like this:
$ perf record -z -- some_long_running_workload
$ perf report --stdio -vv
decomp (B): 44519 to 163000
decomp (B): 48119 to 174800
decomp (B): 65527 to 131072
fetch_mmaped_event: head=0x1ffe0 event->header_size=0x28, mmap_size=0x20000: fuzzed perf.data?
Error:
failed to process sample
...
Testing:
71: Zstd perf.data compression/decompression : Ok
$ tools/perf/perf report -vv --stdio
decomp (B): 59593 to 262160
decomp (B): 4438 to 16512
decomp (B): 285 to 880
Looking at the vmlinux_path (8 entries long)
Using vmlinux for symbols
decomp (B): 57474 to 261248
prefetch_event: head=0x3fc78 event->header_size=0x28, mmap_size=0x3fc80: fuzzed or compressed perf.data?
decomp (B): 25 to 32
decomp (B): 52 to 120
...
Fixes: 57fc032ad643 ("perf session: Avoid infinite loop when seeing invalid header.size")
Link: https://marc.info/?l=linux-kernel&m=156580812427554&w=2
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/cf782c34-f3f8-2f9f-d6ab-145cee0d5322@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
And take it into account when looking up DSOs when we have the dso_id
fields obtained from somewhere, like from PERF_RECORD_MMAP2 records.
Instances of struct map pointing to the same DSO pathname but with
anything in dso_id different are in fact different DSOs, so better have
different 'struct dso' instances to reflect that. At some point we may
want to get copies of the contents of the different objects if we want
to do correct annotation or other analysis.
With this we get 'struct map' 24 bytes leaner:
$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned:1; /* 40: 0 1 */
_Bool priv:1; /* 40: 1 1 */
/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */
u32 prot; /* 44 4 */
u64 pgoff; /* 48 8 */
u64 reloc; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 (*map_ip)(struct map *, u64); /* 64 8 */
u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
struct dso * dso; /* 80 8 */
refcount_t refcnt; /* 88 4 */
u32 flags; /* 92 4 */
/* size: 96, cachelines: 2, members: 13 */
/* sum members: 92, holes: 1, sum holes: 3 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* forced alignments: 1 */
/* last cacheline: 32 bytes */
} __attribute__((__aligned__(8)));
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-g4hxxmraplo7wfjmk384mfsb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Drivers use different fields to report the number of channels, so take
the maximum of all data channels (rx, tx, combined) when determining the
size of the xsk map. The current code used only 'combined' which was set
to 0 in some drivers e.g. mlx4.
Tested: compiled and run xdpsock -q 3 -r -S on mlx4
Signed-off-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20191119001951.92930-1-lrizzo@google.com
|
|
Not used anywhere, nuke it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-teqz0eqcw43mnt7i3me44esw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We'll use it when doing DSO lookups using dso_ids.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-u2nr1oq03o0i29w2ay9jx03s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Instead of the 4 fields, a step in the direction of moving this to
struct dso.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-gp5s1xgxacurmih5d1l94ymy@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
And this patch highlights where these fields are being used: in the sort
order where it uses it to compare maps and classify samples taking into
account not just the DSO, but those DSO id fields.
I think these should be used to differentiate DSOs with the same name
but different 'struct dso_id' fields, i.e. these fields should move to
'struct dso' and then be used as part of the key when doing lookups for
DSOs, in addition to the DSO name.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-8v5isitqy0dup47nnwkpc80f@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Check configurations and packets transference with different variations
of autoneg and speed.
Test plan:
1. Test force of same speed with autoneg off
2. Test force of different speeds with autoneg off (should fail)
3. One side is autoneg on and other side sets force of common speeds
4. One side is autoneg on and other side only advertises a subset of the
common speeds (one speed of the subset)
5. One side is autoneg on and other side only advertises a subset of the
common speeds. Check that highest speed is negotiated
6. Test autoneg on, but each side advertises different speeds (should
fail)
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a function that waits for device with maximum number of iterations.
It enables to limit the waiting and prevent infinite loop.
This will be used by the subsequent patch which will set two ports to
different speeds in order to make sure they cannot negotiate a link.
Waiting for all the setup is limited with 10 minutes for each device.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Functions:
1. speeds_arr_get
The function returns an array of speed values from
/usr/include/linux/ethtool.h The array looks as follows:
[10baseT/Half] = 0,
[10baseT/Full] = 1,
...
2. ethtool_set:
params: cmd
The function runs ethtool by cmd (ethtool -s cmd) and checks if
there was an error in configuration
3. dev_speeds_get:
params: dev, with_mode (0 or 1), adver (0 or 1)
return value: Array of supported/Advertised link modes
with/without mode
* Example 1:
speeds_get swp1 0 0
return: 1000 10000 40000
* Example 2:
speeds_get swp1 1 1
return: 1000baseKX/Full 10000baseKR/Full 40000baseCR4/Full
4. common_speeds_get:
params: dev1, dev2, with_mode (0 or 1), adver (0 or 1)
return value: Array of common speeds of dev1 and dev2
* Example:
common_speeds_get swp1 swp2 0 0
return: 1000 10000
Assuming that swp1 supports 1000 10000 40000 and swp2 supports
1000 10000
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The scale test for Spectrum-2 should only be invoked for Spectrum-2.
Skip the test otherwise.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Same as for Spectrum-1, test the ability to add the maximum number of
routes possible to the switch.
Invoke the test from the 'resource_scale' wrapper script.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Record the first event parsing error and report. Implementing feedback
from Jiri Olsa:
https://lkml.org/lkml/2019/10/28/680
An example error is:
$ tools/perf/perf stat -e c/c/
WARNING: multiple event parsing errors
event syntax error: 'c/c/'
\___ unknown term
valid terms: event,filter_rem,filter_opc0,edge,filter_isoc,filter_tid,filter_loc,filter_nc,inv,umask,filter_opc1,tid_en,thresh,filter_all_op,filter_not_nm,filter_state,filter_nm,config,config1,config2,name,period,percore
Initial error:
event syntax error: 'c/c/'
\___ Cannot find PMU `c'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20191116074652.9960-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Trace a magic number as immediate value if the target variable is not
found at some probe points which is based on one probe event.
This feature is good for the case if you trace a source code line with
some local variables, which is compiled into several instructions and
some of the variables are optimized out on some instructions.
Even if so, with this feature, perf probe trace a magic number instead
of such disappeared variables and fold those probes on one event.
E.g. without this patch:
# perf probe -D "pud_page_vaddr pud"
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
p:probe/pud_page_vaddr _text+23480787 pud=%ax:x64
p:probe/pud_page_vaddr _text+23808453 pud=%bp:x64
p:probe/pud_page_vaddr _text+23558082 pud=%ax:x64
p:probe/pud_page_vaddr _text+328373 pud=%r8:x64
p:probe/pud_page_vaddr _text+348448 pud=%bx:x64
p:probe/pud_page_vaddr _text+23816818 pud=%bx:x64
With this patch:
# perf probe -D "pud_page_vaddr pud" | head
spurious_kernel_fault is blacklisted function, skip it.
vmalloc_fault is blacklisted function, skip it.
p:probe/pud_page_vaddr _text+23480787 pud=%ax:x64
p:probe/pud_page_vaddr _text+149051 pud=\deade12d:x64
p:probe/pud_page_vaddr _text+23808453 pud=%bp:x64
p:probe/pud_page_vaddr _text+315926 pud=\deade12d:x64
p:probe/pud_page_vaddr _text+23807209 pud=\deade12d:x64
p:probe/pud_page_vaddr _text+23557365 pud=%ax:x64
p:probe/pud_page_vaddr _text+314097 pud=%di:x64
p:probe/pud_page_vaddr _text+314015 pud=\deade12d:x64
p:probe/pud_page_vaddr _text+313893 pud=\deade12d:x64
p:probe/pud_page_vaddr _text+324083 pud=\deade12d:x64
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406476931.24476.6261475888681844285.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Support DW_AT_const_value for variable assignment instead of location.
Note that this requires ftrace supporting immediate value.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406476012.24476.16096289871757175775.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Support multiprobe event if the event is based on function and lines and
kernel supports it. In this case, perf probe creates the first probe
with an event, and tries to append following probes on that event, since
those probes must be on the same source code line.
Before this patch;
# perf probe -a vfs_read:18
Added new events:
probe:vfs_read_L18 (on vfs_read:18)
probe:vfs_read_L18_1 (on vfs_read:18)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_read_L18_1 -aR sleep 1
#
After this patch (on multiprobe supported kernel)
# perf probe -a vfs_read:18
Added new events:
probe:vfs_read_L18 (on vfs_read:18)
probe:vfs_read_L18 (on vfs_read:18)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_read_L18 -aR sleep 1
#
Committer testing:
On a kernel that doesn't support multiprobe events, after this patch:
# uname -a
Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# grep append /sys/kernel/debug/tracing/README
be modified by appending '.descending' or '.ascending' to a
can be modified by appending any of the following modifiers
#
# perf probe -a vfs_read:18
Added new events:
probe:vfs_read_L18 (on vfs_read:18)
probe:vfs_read_L18_1 (on vfs_read:18)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_read_L18_1 -aR sleep 1
# perf probe -l
probe:vfs_read_L18 (on vfs_read:18@fs/read_write.c)
probe:vfs_read_L18_1 (on vfs_read:18@fs/read_write.c)
#
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406475010.24476.586290752591512351.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Generate event name from function name with line number as
<function>_L<line_number>. Note that this is only for the new event
which is defined by the line number of function (except for line 0).
If there is another event on same line, you have to use
"-f" option. In that case, the new event has "_1" suffix.
e.g.
# perf probe -a kernel_read:2
Added new event:
probe:kernel_read_L2 (on kernel_read:2)
You can now use it in all perf tools, such as:
perf record -e probe:kernel_read_L2 -aR sleep 1
But if we omit the line number or 0th line, it will
have no suffix.
# perf probe -a kernel_read:0
Added new event:
probe:kernel_read (on kernel_read)
You can now use it in all perf tools, such as:
perf record -e probe:kernel_read -aR sleep 1
probe:kernel_read (on kernel_read@linux-5.0.0/fs/read_write.c)
probe:kernel_read_L2 (on kernel_read:2@linux-5.0.0/fs/read_write.c)
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406474026.24476.2828897745502059569.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Since perf probe -L shows non representive lines, it can be mislead
users where user can put probes. This prevents to show such non
representive lines so that user can understand which lines user can
probe.
# perf probe -L kernel_read
<kernel_read@/build/linux-pvZVvI/linux-5.0.0/fs/read_write.c:0>
0 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
{
2 mm_segment_t old_fs;
ssize_t result;
old_fs = get_fs();
6 set_fs(get_ds());
/* The cast to a user pointer is valid due to the set_fs() */
8 result = vfs_read(file, (void __user *)buf, count, pos);
9 set_fs(old_fs);
10 return result;
}
EXPORT_SYMBOL(kernel_read);
Committer testing:
Before:
# perf probe -L kernel_read
<kernel_read@/usr/src/debug/kernel-5.3.fc30/linux-5.3.8-200.fc30.x86_64/fs/read_write.c:0>
0 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
1 {
2 mm_segment_t old_fs;
3 ssize_t result;
5 old_fs = get_fs();
6 set_fs(KERNEL_DS);
/* The cast to a user pointer is valid due to the set_fs() */
8 result = vfs_read(file, (void __user *)buf, count, pos);
9 set_fs(old_fs);
10 return result;
}
EXPORT_SYMBOL(kernel_read);
#
See the 1, 3, 5 lines? They shouldn't be there, after this patch:
# perf probe -L kernel_read
<kernel_read@/usr/src/debug/kernel-5.3.fc30/linux-5.3.8-200.fc30.x86_64/fs/read_write.c:0>
0 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
{
2 mm_segment_t old_fs;
ssize_t result;
old_fs = get_fs();
6 set_fs(KERNEL_DS);
/* The cast to a user pointer is valid due to the set_fs() */
8 result = vfs_read(file, (void __user *)buf, count, pos);
9 set_fs(old_fs);
10 return result;
}
EXPORT_SYMBOL(kernel_read);
#
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406473064.24476.2913278267727587314.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Verify user given probe line is a representive line (which doesn't share
the address with other lines or the line is the least line among the
lines which shares same address), and if not, it shows what is the
representive line.
Without this fix, user can put a probe on the lines which is not a a
representive line. But since this is not a representive line, perf probe
-l shows a representive line number instead of user given line number.
e.g. (put kernel_read:3, but listed as kernel_read:2)
# perf probe -a kernel_read:3
Added new event:
probe:kernel_read (on kernel_read:3)
You can now use it in all perf tools, such as:
perf record -e probe:kernel_read -aR sleep 1
# perf probe -l
probe:kernel_read (on kernel_read:2@linux-5.0.0/fs/read_write.c)
With this fix, perf probe doesn't allow user to put a probe on a
representive line, and tell what is the representive line.
# perf probe -a kernel_read:3
This line is sharing the addrees with other lines.
Please try to probe at kernel_read:2 instead.
Error: Failed to add events.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406472071.24476.14915451439785001021.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The dwarf_getsrc_die() can return the line which is not a statement nor
the least line number among the lines which shares same address.
This can lead perf probe --list shows incorrect line number for probed
address.
To fix this, this introduces cu_getsrc_die() which returns only a
statement line and which is the least line number (we call it the
representive line for an address), and use it in cu_find_lineinfo().
Also, if the given address is the entry address of a real function,
cu_find_lineinfo() returns the function declared line number instead of
the start line number of the function body.
For example, without this change perf probe -l shows incorrect line as
below.
# perf probe -a kernel_read:2
Added new event:
probe:kernel_read (on kernel_read:2)
You can now use it in all perf tools, such as:
perf record -e probe:kernel_read -aR sleep 1
# perf probe -l
probe:kernel_read (on kernel_read:1@linux-5.0.0/fs/read_write.c)
With this fix, it shows correct line number as below;
# perf probe -l
probe:kernel_read (on kernel_read:2@linux-5.0.0/fs/read_write.c)
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406471067.24476.17463149618465494448.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add to the opcode map the following instructions:
cldemote
tpause
umonitor
umwait
movdiri
movdir64b
enqcmd
enqcmds
encls
enclu
enclv
pconfig
wbnoinvd
For information about the instructions, refer Intel SDM May 2019
(325462-070US) and Intel Architecture Instruction Set Extensions
May 2019 (319433-037).
The instruction decoding can be tested using the perf tools'
"x86 instruction decoder - new instructions" test as folllows:
$ perf test -v "new " 2>&1 | grep -i cldemote
Decoded ok: 0f 1c 00 cldemote (%eax)
Decoded ok: 0f 1c 05 78 56 34 12 cldemote 0x12345678
Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%eax,%ecx,8)
Decoded ok: 0f 1c 00 cldemote (%rax)
Decoded ok: 41 0f 1c 00 cldemote (%r8)
Decoded ok: 0f 1c 04 25 78 56 34 12 cldemote 0x12345678
Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%rax,%rcx,8)
Decoded ok: 41 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%r8,%rcx,8)
$ perf test -v "new " 2>&1 | grep -i tpause
Decoded ok: 66 0f ae f3 tpause %ebx
Decoded ok: 66 0f ae f3 tpause %ebx
Decoded ok: 66 41 0f ae f0 tpause %r8d
$ perf test -v "new " 2>&1 | grep -i umonitor
Decoded ok: 67 f3 0f ae f0 umonitor %ax
Decoded ok: f3 0f ae f0 umonitor %eax
Decoded ok: 67 f3 0f ae f0 umonitor %eax
Decoded ok: f3 0f ae f0 umonitor %rax
Decoded ok: 67 f3 41 0f ae f0 umonitor %r8d
$ perf test -v "new " 2>&1 | grep -i umwait
Decoded ok: f2 0f ae f0 umwait %eax
Decoded ok: f2 0f ae f0 umwait %eax
Decoded ok: f2 41 0f ae f0 umwait %r8d
$ perf test -v "new " 2>&1 | grep -i movdiri
Decoded ok: 0f 38 f9 03 movdiri %eax,(%ebx)
Decoded ok: 0f 38 f9 88 78 56 34 12 movdiri %ecx,0x12345678(%eax)
Decoded ok: 48 0f 38 f9 03 movdiri %rax,(%rbx)
Decoded ok: 48 0f 38 f9 88 78 56 34 12 movdiri %rcx,0x12345678(%rax)
$ perf test -v "new " 2>&1 | grep -i movdir64b
Decoded ok: 66 0f 38 f8 18 movdir64b (%eax),%ebx
Decoded ok: 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%eax),%ecx
Decoded ok: 67 66 0f 38 f8 1c movdir64b (%si),%bx
Decoded ok: 67 66 0f 38 f8 8c 34 12 movdir64b 0x1234(%si),%cx
Decoded ok: 66 0f 38 f8 18 movdir64b (%rax),%rbx
Decoded ok: 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%rax),%rcx
Decoded ok: 67 66 0f 38 f8 18 movdir64b (%eax),%ebx
Decoded ok: 67 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%eax),%ecx
$ perf test -v "new " 2>&1 | grep -i enqcmd
Decoded ok: f2 0f 38 f8 18 enqcmd (%eax),%ebx
Decoded ok: f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%eax),%ecx
Decoded ok: 67 f2 0f 38 f8 1c enqcmd (%si),%bx
Decoded ok: 67 f2 0f 38 f8 8c 34 12 enqcmd 0x1234(%si),%cx
Decoded ok: f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
Decoded ok: 67 f3 0f 38 f8 1c enqcmds (%si),%bx
Decoded ok: 67 f3 0f 38 f8 8c 34 12 enqcmds 0x1234(%si),%cx
Decoded ok: f2 0f 38 f8 18 enqcmd (%rax),%rbx
Decoded ok: f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%rax),%rcx
Decoded ok: 67 f2 0f 38 f8 18 enqcmd (%eax),%ebx
Decoded ok: 67 f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%eax),%ecx
Decoded ok: f3 0f 38 f8 18 enqcmds (%rax),%rbx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%rax),%rcx
Decoded ok: 67 f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: 67 f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
$ perf test -v "new " 2>&1 | grep -i enqcmds
Decoded ok: f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
Decoded ok: 67 f3 0f 38 f8 1c enqcmds (%si),%bx
Decoded ok: 67 f3 0f 38 f8 8c 34 12 enqcmds 0x1234(%si),%cx
Decoded ok: f3 0f 38 f8 18 enqcmds (%rax),%rbx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%rax),%rcx
Decoded ok: 67 f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: 67 f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
$ perf test -v "new " 2>&1 | grep -i encls
Decoded ok: 0f 01 cf encls
Decoded ok: 0f 01 cf encls
$ perf test -v "new " 2>&1 | grep -i enclu
Decoded ok: 0f 01 d7 enclu
Decoded ok: 0f 01 d7 enclu
$ perf test -v "new " 2>&1 | grep -i enclv
Decoded ok: 0f 01 c0 enclv
Decoded ok: 0f 01 c0 enclv
$ perf test -v "new " 2>&1 | grep -i pconfig
Decoded ok: 0f 01 c5 pconfig
Decoded ok: 0f 01 c5 pconfig
$ perf test -v "new " 2>&1 | grep -i wbnoinvd
Decoded ok: f3 0f 09 wbnoinvd
Decoded ok: f3 0f 09 wbnoinvd
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/20191115135447.6519-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add to the "x86 instruction decoder - new instructions" test the following
instructions:
cldemote
tpause
umonitor
umwait
movdiri
movdir64b
enqcmd
enqcmds
encls
enclu
enclv
pconfig
wbnoinvd
For information about the instructions, refer Intel SDM May 2019
(325462-070US) and Intel Architecture Instruction Set Extensions
May 2019 (319433-037).
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/20191115135447.6519-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently, with latest llvm trunk, selftest test_progs failed obj
file test_seg6_loop.o with the following error in verifier:
infinite loop detected at insn 76
The byte code sequence looks like below, and noted that alu32 has been
turned off by default for better generated codes in general:
48: w3 = 100
49: *(u32 *)(r10 - 68) = r3
...
; if (tlv.type == SR6_TLV_PADDING) {
76: if w3 == 5 goto -18 <LBB0_19>
...
85: r1 = *(u32 *)(r10 - 68)
; for (int i = 0; i < 100; i++) {
86: w1 += -1
87: if w1 == 0 goto +5 <LBB0_20>
88: *(u32 *)(r10 - 68) = r1
The main reason for verification failure is due to partial spills at
r10 - 68 for induction variable "i".
Current verifier only handles spills with 8-byte values. The above 4-byte
value spill to stack is treated to STACK_MISC and its content is not
saved. For the above example:
w3 = 100
R3_w=inv100 fp-64_w=inv1086626730498
*(u32 *)(r10 - 68) = r3
R3_w=inv100 fp-64_w=inv1086626730498
...
r1 = *(u32 *)(r10 - 68)
R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
fp-64=inv1086626730498
To resolve this issue, verifier needs to be extended to track sub-registers
in spilling, or llvm needs to enhanced to prevent sub-register spilling
in register allocation phase. The former will increase verifier complexity
and the latter will need some llvm "hacking".
Let us workaround this issue by declaring the induction variable as "long"
type so spilling will happen at non sub-register level. We can revisit this
later if sub-register spilling causes similar or other verification issues.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191117214036.1309510-1-yhs@fb.com
|
|
When run_kselftests.sh is run, it hangs after test_tc_tunnel.sh. The reason
is test_tc_tunnel.sh ensures the server ('nc -l') is run all the time,
starting it again every time it is expected to terminate. The exception is
the final client_connect: the server is not started anymore, which ensures
no process is kept running after the test is finished.
For a sit test, though, the script is terminated prematurely without the
final client_connect and the 'nc' process keeps running. This in turn causes
the run_one function in kselftest/runner.sh to hang forever, waiting for the
runaway process to finish.
Ensure a remaining server is terminated on cleanup.
Fixes: f6ad6accaa99 ("selftests/bpf: expand test_tc_tunnel with SIT encap")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/60919291657a9ee89c708d8aababc28ebe1420be.1573821780.git.jbenc@redhat.com
|
|
The actual test to run is test_xdping.sh, which is already in TEST_PROGS.
The xdping program alone is not runnable with 'make run_tests', it
immediatelly fails due to missing arguments.
Move xdping to TEST_GEN_PROGS_EXTENDED in order to be built but not run.
Fixes: cd5385029f1d ("selftests/bpf: measure RTT from xdp using xdping")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/4365c81198f62521344c2215909634407184387e.1573821726.git.jbenc@redhat.com
|
|
So we start with:
$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned:1; /* 40: 0 1 */
_Bool priv:1; /* 40: 1 1 */
/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */
u32 prot; /* 44 4 */
u32 flags; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
u64 pgoff; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 reloc; /* 64 8 */
u32 maj; /* 72 4 */
u32 min; /* 76 4 */
u64 ino; /* 80 8 */
u64 ino_generation; /* 88 8 */
u64 (*map_ip)(struct map *, u64); /* 96 8 */
u64 (*unmap_ip)(struct map *, u64); /* 104 8 */
struct dso * dso; /* 112 8 */
refcount_t refcnt; /* 120 4 */
/* size: 128, cachelines: 2, members: 17 */
/* sum members: 116, holes: 2, sum holes: 7 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* padding: 4 */
/* forced alignments: 1 */
} __attribute__((__aligned__(8)));
$
and 'flags' is seldom used when printing details about the map or with
the "cacheline" sort order, we can move them it to the second cacheline,
that will allow combining it with 'refcnt', that is only four bytes:
$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned:1; /* 40: 0 1 */
_Bool priv:1; /* 40: 1 1 */
/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */
u32 prot; /* 44 4 */
u64 pgoff; /* 48 8 */
u64 reloc; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u32 maj; /* 64 4 */
u32 min; /* 68 4 */
u64 ino; /* 72 8 */
u64 ino_generation; /* 80 8 */
u64 (*map_ip)(struct map *, u64); /* 88 8 */
u64 (*unmap_ip)(struct map *, u64); /* 96 8 */
struct dso * dso; /* 104 8 */
refcount_t refcnt; /* 112 4 */
u32 flags; /* 116 4 */
/* size: 120, cachelines: 2, members: 17 */
/* sum members: 116, holes: 1, sum holes: 3 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* forced alignments: 1 */
/* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-2cdw3zlw1mkamaf7nqtdlxfi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The map->priv and map->erange_warned are seldom used, the first only in
tests/vmlinux-kallsyms.c, the later only when hist_entry__inc_addr_samples()
returns -ERANGE in 'perf top', which are really rare occasions, so make
them a bool bitfield.
This will open up space for other members on the first cacheline.
$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned:1; /* 40: 0 1 */
_Bool priv:1; /* 40: 1 1 */
/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */
u32 prot; /* 44 4 */
u32 flags; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
u64 pgoff; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 reloc; /* 64 8 */
u32 maj; /* 72 4 */
u32 min; /* 76 4 */
u64 ino; /* 80 8 */
u64 ino_generation; /* 88 8 */
u64 (*map_ip)(struct map *, u64); /* 96 8 */
u64 (*unmap_ip)(struct map *, u64); /* 104 8 */
struct dso * dso; /* 112 8 */
refcount_t refcnt; /* 120 4 */
/* size: 128, cachelines: 2, members: 17 */
/* sum members: 116, holes: 2, sum holes: 7 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* padding: 4 */
/* forced alignments: 1 */
} __attribute__((__aligned__(8)));
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-g5545pcq4ff0wr17tfb1piqt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add missing "%o" and "%X". Ext4 events use "%o" for printing i_mode.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Link: http://lore.kernel.org/lkml/157338066113.6548.11461421296091086041.stgit@buzz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Do not dereference 'chain' when it is NULL.
$ perf record -e intel_pt//u -e branch-misses:u uname
$ perf report --itrace=l --branch-history
perf: Segmentation fault
Fixes: e9024d519d89 ("perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20191114142538.4097-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
There are still lots of lookups by name, even if just when loading
vmlinux, till that code is studied to figure out if its possible to do
away with those map lookup by names, provide a way to sort it using
libc's qsort/bsearch.
Doing it at the first lookup defers the sorting a bit, and as the code
stands now, is never done for user maps, just for the kernel ones.
# perf probe -l
# perf probe -x ~/bin/perf -L __map_groups__find_by_name
<__map_groups__find_by_name@/home/acme/git/perf/tools/perf/util/symbol.c:0>
0 static struct map *__map_groups__find_by_name(struct map_groups *mg, const char *name)
1 {
struct map **mapp;
4 if (mg->maps_by_name == NULL &&
5 map__groups__sort_by_name_from_rbtree(mg))
6 return NULL;
8 mapp = bsearch(name, mg->maps_by_name, mg->nr_maps, sizeof(*mapp), map__strcmp_name);
9 if (mapp)
10 return *mapp;
11 return NULL;
12 }
struct map *map_groups__find_by_name(struct map_groups *mg, const char *name)
{
# perf probe -x ~/bin/perf 'found=__map_groups__find_by_name:10 name:string'
Added new event:
probe_perf:found (on __map_groups__find_by_name:10 in /home/acme/bin/perf with name:string)
You can now use it in all perf tools, such as:
perf record -e probe_perf:found -aR sleep 1
#
# perf probe -x ~/bin/perf -L map_groups__find_by_name
<map_groups__find_by_name@/home/acme/git/perf/tools/perf/util/symbol.c:0>
0 struct map *map_groups__find_by_name(struct map_groups *mg, const char *name)
1 {
2 struct maps *maps = &mg->maps;
struct map *map;
5 down_read(&maps->lock);
7 if (mg->last_search_by_name && strcmp(mg->last_search_by_name->dso->short_name, name) == 0) {
8 map = mg->last_search_by_name;
9 goto out_unlock;
}
/*
* If we have mg->maps_by_name, then the name isn't in the rbtree,
* as mg->maps_by_name mirrors the rbtree when lookups by name are
* made.
*/
16 map = __map_groups__find_by_name(mg, name);
17 if (map || mg->maps_by_name != NULL)
18 goto out_unlock;
/* Fallback to traversing the rbtree... */
21 maps__for_each_entry(maps, map)
22 if (strcmp(map->dso->short_name, name) == 0) {
23 mg->last_search_by_name = map;
24 goto out_unlock;
}
27 map = NULL;
out_unlock:
30 up_read(&maps->lock);
31 return map;
32 }
int dso__load_vmlinux(struct dso *dso, struct map *map,
const char *vmlinux, bool vmlinux_allocated)
# perf probe -x ~/bin/perf 'fallback=map_groups__find_by_name:21 name:string'
Added new events:
probe_perf:fallback (on map_groups__find_by_name:21 in /home/acme/bin/perf with name:string)
probe_perf:fallback_1 (on map_groups__find_by_name:21 in /home/acme/bin/perf with name:string)
You can now use it in all perf tools, such as:
perf record -e probe_perf:fallback_1 -aR sleep 1
#
# perf probe -l
probe_perf:fallback (on map_groups__find_by_name:21@util/symbol.c in /home/acme/bin/perf with name_string)
probe_perf:fallback_1 (on map_groups__find_by_name:21@util/symbol.c in /home/acme/bin/perf with name_string)
probe_perf:found (on __map_groups__find_by_name:10@util/symbol.c in /home/acme/bin/perf with name_string)
#
# perf stat -e probe_perf:*
Now run 'perf top' in another term and then, after a while, stop 'perf stat':
Furthermore, if we ask for interval printing, we can see that that is done just
at the start of the workload:
# perf stat -I1000 -e probe_perf:*
# time counts unit events
1.000319513 0 probe_perf:found
1.000319513 0 probe_perf:fallback_1
1.000319513 0 probe_perf:fallback
2.001868092 23,251 probe_perf:found
2.001868092 0 probe_perf:fallback_1
2.001868092 0 probe_perf:fallback
3.002901597 0 probe_perf:found
3.002901597 0 probe_perf:fallback_1
3.002901597 0 probe_perf:fallback
4.003358591 0 probe_perf:found
4.003358591 0 probe_perf:fallback_1
4.003358591 0 probe_perf:fallback
^C
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-c5lmbyr14x448rcfii7y6t3k@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We'only populating maps for kernel modules either from perf.data file
PERF_RECORD_MMAP records or when parsing /proc/modules, so there is no
need to first look if we already have those module maps in the list,
that would mean the kernel has duplicate entries.
So ditch one use of looking up maps by name.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-gnzjg2hhuz6jnrw91m35059y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
At the end of a 'perf record' session, by default, we'll process all
samples and populate the threads, maps, etc so as to find out which of
the DSOs got samples, to reduce the size of the build-id table we'll
add to the perf.data headers.
But we don't need to process the PERF_RECORD_MMAP events synthesized
for the kernel modules, as we have those already via
perf_session__create_kernel_maps(), so add mmap/mmap2 handlers that
first look at event->header.misc to see if the event is for a user map,
bailing out if not.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-mofoxvcx2dryppcw3o689jdd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
At some point in the past we needed to make sure we would get the long
name of modules and not just what we get from /proc/modules, but that
need, as described in the cset that introduced the adjustment function:
Fixes: c03d5184f0e9 ("perf machine: Adjust dso->long_name for offline module")
Without using the buildid-cache:
# lsmod | grep trusted
# insmod trusted.ko
# lsmod | grep trusted
trusted 24576 0
# strace -e open,openat perf probe -m ./trusted.ko key_seal |& grep trusted
openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 4
openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 7
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
openat(AT_FDCWD, "/root/.debug/root/trusted.ko/dd3d355d567394d540f527e093e0f64b95879584/probes", O_RDWR|O_CREAT, 0644) = 3
openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/root/.debug/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, ".debug/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 4
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
probe:key_seal (on key_seal in trusted)
# perf probe -l
probe:key_seal (on key_seal in trusted)
#
No attempt at opening '[trusted]'.
Now using the build-id cache:
# rmmod trusted
# perf buildid-cache --add ./trusted.ko
# insmod trusted.ko
# strace -e open,openat perf probe -m ./trusted.ko key_seal |& grep trusted
openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 4
openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 7
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
openat(AT_FDCWD, "/root/.debug/root/trusted.ko/dd3d355d567394d540f527e093e0f64b95879584/probes", O_RDWR|O_CREAT, 0644) = 3
openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/root/.debug/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, ".debug/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 4
openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
#
Again, no attempt at reading '[trusted]'.
Finally, adding a probe to that function and then using:
[root@quaco ~]# perf trace -e probe_perf:*/max-stack=16/ --max-events=2
0.000 perf/13456 probe_perf:dso__adjust_kmod_long_name(__probe_ip: 5492263)
dso__adjust_kmod_long_name (/home/acme/bin/perf)
machine__process_kernel_mmap_event (/home/acme/bin/perf)
machine__process_mmap_event (/home/acme/bin/perf)
perf_event__process_mmap (/home/acme/bin/perf)
machines__deliver_event (/home/acme/bin/perf)
perf_session__deliver_event (/home/acme/bin/perf)
perf_session__process_event (/home/acme/bin/perf)
process_simple (/home/acme/bin/perf)
reader__process_events (/home/acme/bin/perf)
__perf_session__process_events (/home/acme/bin/perf)
perf_session__process_events (/home/acme/bin/perf)
process_buildids (/home/acme/bin/perf)
record__finish_output (/home/acme/bin/perf)
__cmd_record (/home/acme/bin/perf)
cmd_record (/home/acme/bin/perf)
run_builtin (/home/acme/bin/perf)
0.055 perf/13456 probe_perf:dso__adjust_kmod_long_name(__probe_ip: 5492263)
dso__adjust_kmod_long_name (/home/acme/bin/perf)
machine__process_kernel_mmap_event (/home/acme/bin/perf)
machine__process_mmap_event (/home/acme/bin/perf)
perf_event__process_mmap (/home/acme/bin/perf)
machines__deliver_event (/home/acme/bin/perf)
perf_session__deliver_event (/home/acme/bin/perf)
perf_session__process_event (/home/acme/bin/perf)
process_simple (/home/acme/bin/perf)
reader__process_events (/home/acme/bin/perf)
__perf_session__process_events (/home/acme/bin/perf)
perf_session__process_events (/home/acme/bin/perf)
process_buildids (/home/acme/bin/perf)
record__finish_output (/home/acme/bin/perf)
__cmd_record (/home/acme/bin/perf)
cmd_record (/home/acme/bin/perf)
run_builtin (/home/acme/bin/perf)
#
This was the only path I could find using the perf tools that reach at this
function, then as of november/2019, if we put a probe in the line where the
actuall setting of the dso->long_name is done:
# perf trace -e probe_perf:*
^C[root@quaco ~]
# perf stat -e probe_perf:* -I 2000
2.000404265 0 probe_perf:dso__adjust_kmod_long_name
4.001142200 0 probe_perf:dso__adjust_kmod_long_name
6.001704120 0 probe_perf:dso__adjust_kmod_long_name
8.002398316 0 probe_perf:dso__adjust_kmod_long_name
10.002984010 0 probe_perf:dso__adjust_kmod_long_name
12.003597851 0 probe_perf:dso__adjust_kmod_long_name
14.004113303 0 probe_perf:dso__adjust_kmod_long_name
16.004582773 0 probe_perf:dso__adjust_kmod_long_name
18.005176373 0 probe_perf:dso__adjust_kmod_long_name
20.005801605 0 probe_perf:dso__adjust_kmod_long_name
22.006467540 0 probe_perf:dso__adjust_kmod_long_name
^C 23.683261941 0 probe_perf:dso__adjust_kmod_long_name
#
Its not being used at all.
To further test this I used kvm.ko as the offline module, i.e. removed
if from the buildid-cache by nuking it completely (rm -rf ~/.debug) and
moved it from the normal kernel distro path, removed the modules, stoped
the kvm guest, and then installed it manually, etc.
# rmmod kvm-intel
# rmmod kvm
# lsmod | grep kvm
# modprobe kvm-intel
modprobe: ERROR: ctx=0x55d3b1722260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory
modprobe: ERROR: ctx=0x55d3b1722260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory
modprobe: ERROR: could not insert 'kvm_intel': Unknown symbol in module, or unknown parameter (see dmesg)
# insmod ./kvm.ko
# modprobe kvm-intel
modprobe: ERROR: ctx=0x562f34026260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory
modprobe: ERROR: ctx=0x562f34026260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory
# lsmod | grep kvm
kvm_intel 299008 0
kvm 765952 1 kvm_intel
irqbypass 16384 1 kvm
#
# perf probe -x ~/bin/perf machine__findnew_module_map:12 mname=m.name:string filename=filename:string 'dso_long_name=map->dso->long_name:string' 'dso_name=map->dso->name:string'
# perf probe -l
probe_perf:machine__findnew_module_map (on machine__findnew_module_map:12@util/machine.c in /home/acme/bin/perf with mname filename dso_long_name dso_name)
# perf record
^C[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 3.416 MB perf.data (33956 samples) ]
# perf trace -e probe_perf:machine*
<SNIP>
6.322 perf/23099 probe_perf:machine__findnew_module_map(__probe_ip: 5492493, mname: "[salsa20_generic]", filename: "/lib/modules/5.3.8-200.fc30.x86_64/kernel/crypto/salsa20_generic.ko.xz", dso_long_name: "/lib/modules/5.3.8-200.fc30.x86_64/kernel/crypto/salsa20_generic.ko.xz", dso_name: "[salsa20_generic]")
6.375 perf/23099 probe_perf:machine__findnew_module_map(__probe_ip: 5492493, mname: "[kvm]", filename: "[kvm]", dso_long_name: "[kvm]", dso_name: "[kvm]")
<SNIP>
The filename doesn't come with the path, no point in trying to set the dso->long_name.
[root@quaco ~]# strace -e open,openat perf probe -m ./kvm.ko kvm_apic_local_deliver |& egrep 'open.*kvm'
openat(AT_FDCWD, "/sys/module/kvm_intel/notes/.note.gnu.build-id", O_RDONLY) = 4
openat(AT_FDCWD, "/sys/module/kvm/notes/.note.gnu.build-id", O_RDONLY) = 4
openat(AT_FDCWD, "/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 7
openat(AT_FDCWD, "/sys/module/kvm_intel/notes/.note.gnu.build-id", O_RDONLY) = 8
openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3
openat(AT_FDCWD, "/root/.debug/root/kvm.ko/5955f426cb93f03f30f3e876814be2db80ab0b55/probes", O_RDWR|O_CREAT, 0644) = 3
openat(AT_FDCWD, "/usr/lib/debug/root/kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/debug/root/kvm.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/root/.debug/kvm.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3
openat(AT_FDCWD, "kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, ".debug/kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3
openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3
openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 4
openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3
[root@quaco ~]#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-jlfew3lyb24d58egrp0o72o2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Lets see if it helps:
First look at the probeable lines for the function that does lookups by
name in a map_groups struct:
# perf probe -x ~/bin/perf -L map_groups__find_by_name
<map_groups__find_by_name@/home/acme/git/perf/tools/perf/util/symbol.c:0>
0 struct map *map_groups__find_by_name(struct map_groups *mg, const char *name)
1 {
2 struct maps *maps = &mg->maps;
struct map *map;
5 down_read(&maps->lock);
7 if (mg->last_search_by_name && strcmp(mg->last_search_by_name->dso->short_name, name) == 0) {
8 map = mg->last_search_by_name;
9 goto out_unlock;
}
12 maps__for_each_entry(maps, map)
13 if (strcmp(map->dso->short_name, name) == 0) {
14 mg->last_search_by_name = map;
15 goto out_unlock;
}
18 map = NULL;
out_unlock:
21 up_read(&maps->lock);
22 return map;
23 }
int dso__load_vmlinux(struct dso *dso, struct map *map,
const char *vmlinux, bool vmlinux_allocated)
#
Now add a probe to the place where we reuse the last search:
# perf probe -x ~/bin/perf map_groups__find_by_name:8
Added new event:
probe_perf:map_groups__find_by_name (on map_groups__find_by_name:8 in /home/acme/bin/perf)
You can now use it in all perf tools, such as:
perf record -e probe_perf:map_groups__find_by_name -aR sleep 1
#
Now lets do a system wide 'perf stat' counting those events:
# perf stat -e probe_perf:*
Leave it running and lets do a 'perf top', then, after a while, stop the
'perf stat':
# perf stat -e probe_perf:*
^C
Performance counter stats for 'system wide':
3,603 probe_perf:map_groups__find_by_name
44.565253139 seconds time elapsed
#
yeah, good to have.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-tcz37g3nxv3tvxw3q90vga3p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This is only used for the kernel maps, shave 24 bytes out 'struct map'
and just traverse the existing per ip rbtree to look for maps by name,
use a front end cache to reuse the last search if its the same name.
After this 'struct map' is down to just two cachelines:
$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned; /* 40 1 */
/* XXX 3 bytes hole, try to pack */
u32 priv; /* 44 4 */
u32 prot; /* 48 4 */
u32 flags; /* 52 4 */
u64 pgoff; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 reloc; /* 64 8 */
u32 maj; /* 72 4 */
u32 min; /* 76 4 */
u64 ino; /* 80 8 */
u64 ino_generation; /* 88 8 */
u64 (*map_ip)(struct map *, u64); /* 96 8 */
u64 (*unmap_ip)(struct map *, u64); /* 104 8 */
struct dso * dso; /* 112 8 */
refcount_t refcnt; /* 120 4 */
/* size: 128, cachelines: 2, members: 17 */
/* sum members: 121, holes: 1, sum holes: 3 */
/* padding: 4 */
/* forced alignments: 1 */
} __attribute__((__aligned__(8)));
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-bvr8fqfgzxtgnhnwt5sssx5g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: small fixes and enhancements
- selftest improvements
- yield improvements
- cleanups
|
|
Add selftests validating mmap()-ing BPF array maps: both single-element and
multi-element ones. Check that plain bpf_map_update_elem() and
bpf_map_lookup_elem() work correctly with memory-mapped array. Also convert
CO-RE relocation tests to use memory-mapped views of global data.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-6-andriin@fb.com
|
|
Add detection of BPF_F_MMAPABLE flag support for arrays and add it as an extra
flag to internal global data maps, if supported by kernel. This allows users
to memory-map global data and use it without BPF map operations, greatly
simplifying user experience.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-5-andriin@fb.com
|
|
Add ability to memory-map contents of BPF array map. This is extremely useful
for working with BPF global data from userspace programs. It allows to avoid
typical bpf_map_{lookup,update}_elem operations, improving both performance
and usability.
There had to be special considerations for map freezing, to avoid having
writable memory view into a frozen map. To solve this issue, map freezing and
mmap-ing is happening under mutex now:
- if map is already frozen, no writable mapping is allowed;
- if map has writable memory mappings active (accounted in map->writecnt),
map freezing will keep failing with -EBUSY;
- once number of writable memory mappings drops to zero, map freezing can be
performed again.
Only non-per-CPU plain arrays are supported right now. Maps with spinlocks
can't be memory mapped either.
For BPF_F_MMAPABLE array, memory allocation has to be done through vmalloc()
to be mmap()'able. We also need to make sure that array data memory is
page-sized and page-aligned, so we over-allocate memory in such a way that
struct bpf_array is at the end of a single page of memory with array->value
being aligned with the start of the second page. On deallocation we need to
accomodate this memory arrangement to free vmalloc()'ed memory correctly.
One important consideration regarding how memory-mapping subsystem functions.
Memory-mapping subsystem provides few optional callbacks, among them open()
and close(). close() is called for each memory region that is unmapped, so
that users can decrease their reference counters and free up resources, if
necessary. open() is *almost* symmetrical: it's called for each memory region
that is being mapped, **except** the very first one. So bpf_map_mmap does
initial refcnt bump, while open() will do any extra ones after that. Thus
number of close() calls is equal to number of open() calls plus one more.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-4-andriin@fb.com
|
|
If the clone3() syscall is not implemented we should skip the tests.
Fixes: 41585bbeeef9 ("selftests: add tests for clone3() with *set_tid")
Fixes: 17a810699c18 ("selftests: add tests for clone3()")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
This is a regression test case for an issue when pids have not been
released on error paths.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191118064750.408003-3-avagin@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
In clone3_set_tid, a few test cases are running in a child process. And
right now, if one of these test cases fails, the whole test will exit
with the success status.
Fixes: 41585bbeeef9 ("selftests: add tests for clone3() with *set_tid")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191118064750.408003-2-avagin@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
Buffers have to be flushed before clone3() to avoid double messages in
the log.
Fixes: 41585bbeeef9 ("selftests: add tests for clone3() with *set_tid")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191118064750.408003-1-avagin@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
Lots of overlapping changes and parallel additions, stuff
like that.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) Fix memory leak in xfrm_state code, from Steffen Klassert.
2) Fix races between devlink reload operations and device
setup/cleanup, from Jiri Pirko.
3) Null deref in NFC code, from Stephan Gerhold.
4) Refcount fixes in SMC, from Ursula Braun.
5) Memory leak in slcan open error paths, from Jouni Hogander.
6) Fix ETS bandwidth validation in hns3, from Yonglong Liu.
7) Info leak on short USB request answers in ax88172a driver, from
Oliver Neukum.
8) Release mem region properly in ep93xx_eth, from Chuhong Yuan.
9) PTP config timestamp flags validation, from Richard Cochran.
10) Dangling pointers after SKB data realloc in seg6, from Andrea Mayer.
11) Missing free_netdev() in gemini driver, from Chuhong Yuan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
ipmr: Fix skb headroom in ipmr_get_route().
net: hns3: cleanup of stray struct hns3_link_mode_mapping
net/smc: fix fastopen for non-blocking connect()
rds: ib: update WR sizes when bringing up connection
net: gemini: add missed free_netdev
net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid
seg6: fix skb transport_header after decap_and_validate()
seg6: fix srh pointer in get_srh()
net: stmmac: Use the correct style for SPDX License Identifier
octeontx2-af: Use the correct style for SPDX License Identifier
ptp: Extend the test program to check the external time stamp flags.
mlx5: Reject requests to enable time stamping on both edges.
igb: Reject requests that fail to enable time stamping on both edges.
dp83640: Reject requests to enable time stamping on both edges.
mv88e6xxx: Reject requests to enable time stamping on both edges.
ptp: Introduce strict checking of external time stamp options.
renesas: reject unsupported external timestamp flags
mlx5: reject unsupported external timestamp flags
igb: reject unsupported external timestamp flags
dp83640: reject unsupported external timestamp flags
...
|
|
tcp_mmap is used as a reference program for TCP rx zerocopy,
so it is important to point out some potential issues.
If multiple threads are concurrently using getsockopt(...
TCP_ZEROCOPY_RECEIVE), there is a chance the low-level mm
functions compete on shared ptl lock, if vma are arbitrary placed.
Instead of letting the mm layer place the chunks back to back,
this patch enforces an alignment so that each thread uses
a different ptl lock.
Performance measured on a 100 Gbit NIC, with 8 tcp_mmap clients
launched at the same time :
$ for f in {1..8}; do ./tcp_mmap -H 2002:a05:6608:290:: & done
In the following run, we reproduce the old behavior by requesting no alignment :
$ tcp_mmap -sz -C $((128*1024)) -a 4096
received 32768 MB (100 % mmap'ed) in 9.69532 s, 28.3516 Gbit
cpu usage user:0.08634 sys:3.86258, 120.511 usec per MB, 171839 c-switches
received 32768 MB (100 % mmap'ed) in 25.4719 s, 10.7914 Gbit
cpu usage user:0.055268 sys:21.5633, 659.745 usec per MB, 9065 c-switches
received 32768 MB (100 % mmap'ed) in 28.5419 s, 9.63069 Gbit
cpu usage user:0.057401 sys:23.8761, 730.392 usec per MB, 14987 c-switches
received 32768 MB (100 % mmap'ed) in 28.655 s, 9.59268 Gbit
cpu usage user:0.059689 sys:23.8087, 728.406 usec per MB, 18509 c-switches
received 32768 MB (100 % mmap'ed) in 28.7808 s, 9.55074 Gbit
cpu usage user:0.066042 sys:23.4632, 718.056 usec per MB, 24702 c-switches
received 32768 MB (100 % mmap'ed) in 28.8259 s, 9.5358 Gbit
cpu usage user:0.056547 sys:23.6628, 723.858 usec per MB, 23518 c-switches
received 32768 MB (100 % mmap'ed) in 28.8808 s, 9.51767 Gbit
cpu usage user:0.059357 sys:23.8515, 729.703 usec per MB, 14691 c-switches
received 32768 MB (100 % mmap'ed) in 28.8879 s, 9.51534 Gbit
cpu usage user:0.047115 sys:23.7349, 725.769 usec per MB, 21773 c-switches
New behavior (automatic alignment based on Hugepagesize),
we can see the system overhead being dramatically reduced.
$ tcp_mmap -sz -C $((128*1024))
received 32768 MB (100 % mmap'ed) in 13.5339 s, 20.3103 Gbit
cpu usage user:0.122644 sys:3.4125, 107.884 usec per MB, 168567 c-switches
received 32768 MB (100 % mmap'ed) in 16.0335 s, 17.1439 Gbit
cpu usage user:0.132428 sys:3.55752, 112.608 usec per MB, 188557 c-switches
received 32768 MB (100 % mmap'ed) in 17.5506 s, 15.6621 Gbit
cpu usage user:0.155405 sys:3.24889, 103.891 usec per MB, 226652 c-switches
received 32768 MB (100 % mmap'ed) in 19.1924 s, 14.3222 Gbit
cpu usage user:0.135352 sys:3.35583, 106.542 usec per MB, 207404 c-switches
received 32768 MB (100 % mmap'ed) in 22.3649 s, 12.2906 Gbit
cpu usage user:0.142429 sys:3.53187, 112.131 usec per MB, 250225 c-switches
received 32768 MB (100 % mmap'ed) in 22.5336 s, 12.1986 Gbit
cpu usage user:0.140654 sys:3.61971, 114.757 usec per MB, 253754 c-switches
received 32768 MB (100 % mmap'ed) in 22.5483 s, 12.1906 Gbit
cpu usage user:0.134035 sys:3.55952, 112.718 usec per MB, 252997 c-switches
received 32768 MB (100 % mmap'ed) in 22.6442 s, 12.139 Gbit
cpu usage user:0.126173 sys:3.71251, 117.147 usec per MB, 253728 c-switches
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add tests that the now emulated iopl() functionality:
- does not longer allow user space to disable interrupts.
- does restore a I/O bitmap when IOPL is dropped
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Add code to the fork path which forces the shared bitmap to be duplicated
and the reference count to be dropped. Verify that the child modifications
did not affect the parent.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|