diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-11-08 09:25:26 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-11-08 09:25:26 -0800 |
commit | bbdbeb0048b443082bcce5ed65a336bcc578a60e (patch) | |
tree | c5a4a8b719e07c8f747dba8b3c9cb22f1124671b /tools/lib/list_sort.c | |
parent | 1e9ed9360f80d13e41684ca458f01fdf922c7c57 (diff) | |
parent | 6b491a86b77c0dc323ca49f3a29a0f67178b75f8 (diff) |
Merge tag 'perf-tools-for-v5.16-2021-11-07-without-bpftool-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
"perf annotate:
- Add riscv64 support.
- Add fusion logic for AMD microarchs.
perf record:
- Add an option to control the synthesizing behavior:
--synth <no|all|task|mmap|cgroup>
core:
- Allow controlling synthesizing PERF_RECORD_ metadata events during
record.
- perf.data reader prep work for multithreaded processing.
- Fix missing exclude_{host,guest} setting in PMUs that don't support
it and that were causing the feature detection code to disable it
for all events, even the ones in PMUs that support it.
- Fix the default use of precise events on AMD, that were always
falling back to non-precise because perf_event_attr.exclude_guest=1
was set and IBS does not have filtering capability, refusing
precise + exclude_guest.
- Add bitfield_swap() to handle branch_stack endian issue.
perf script:
- Show binary offsets for userspace addresses in callchains.
- Support instruction latency via new "ins_lat" selectable field.
- Add dlfilter-show-cycles
perf inject:
- Add vmlinux and ignore-vmlinux arguments, similar to other tools.
perf list:
- Display PMU prefix for partially supported hybrid cache events.
- Display hybrid PMU events with cpu type.
perf stat:
- Improve metrics documentation of data structures.
- Fix memory leaks in the metric code.
- Use NAN for missing event IDs.
- Don't compute unused events.
- Fix memory leak on error path.
- Encode and use metric-id as a metric qualifier.
- Allow metrics with no events.
- Avoid events for an 'if' constant result.
- Only add a referenced metric once.
- Simplify metric_refs calculation.
- Allow modifiers on metrics.
perf test:
- Add workload test of metric and metric groups.
- Workload test of all PMUs.
- vmlinux-kallsyms: Ignore hidden symbols.
- Add pmu-event test for event described as "config=".
- Verify more event members in pmu-events test.
- Add endian test for struct branch_flags on the sample-parsing test.
- Improve temp file cleanup in several tests.
perf daemon:
- Address MSAN warnings on send_cmd().
perf kmem:
- Improve man page for record options
perf srcline:
- Use long-running addr2line per DSO, greatly speeding up the
'srcline' sort order.
perf symbols:
- Ignore $a/$d symbols for ARM modules.
- Fix /proc/kcore access on 32 bit systems.
Kernel UAPI copies:
- Update copy of linux/socket.h with the kernel sources, no change in
tooling output.
libbpf:
- Pull in bpf_program__get_prog_info_linear() from libbpf, too much
specific to perf.
- Deprecate bpf_map__resize() in favor of bpf_map_set_max_entries()
- Install libbpf headers locally when building.
- Bump minimum LLVM C++ std to GNU++14.
libperf:
- Use binary search in perf_cpu_map__idx() as array are sorted.
libtracefs:
- Enable libtracefs dynamic linking.
libtraceevent:
- Increase logging when verbose.
Arch specific:
* PowerPC:
- Add support to expose instruction and data address registers as
part of extended regs.
Vendor events:
* JSON parser:
- Support ConfigCode to set the config= in PMUs
- Make the JSON parser more conformant when in strict mode.
* All JSON files:
- Fix all remaining invalid JSON files.
* ARM:
- Syntax corrections in Neoverse N1 json.
- Categorise the Neoverse V1 counters.
- Add new armv8 PMU events.
- Revise hip08 uncore events.
Hardware tracing:
* auxtrace:
- Add missing Z option to ITRACE_HELP.
- Add itrace A option to approximate IPC.
- Add itrace d+o option to direct debug log to stdout.
* Intel PT:
- Add support for PERF_RECORD_AUX_OUTPUT_HW_ID
- Support itrace A option to approximate IPC
- Support itrace d+o option to direct debug log to stdout"
* tag 'perf-tools-for-v5.16-2021-11-07-without-bpftool-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (120 commits)
perf build: Install libbpf headers locally when building
perf MANIFEST: Add bpftool files to allow building with BUILD_BPF_SKEL=1
perf metric: Fix memory leaks
perf parse-event: Add init and exit to parse_event_error
perf parse-events: Rename parse_events_error functions
perf stat: Fix memory leak on error path
perf tools: Use __BYTE_ORDER__
perf inject: Add vmlinux and ignore-vmlinux arguments
perf tools: Check vmlinux/kallsyms arguments in all tools
perf tools: Refactor out kernel symbol argument sanity checking
perf symbols: Ignore $a/$d symbols for ARM modules
perf evsel: Don't set exclude_guest by default
perf evsel: Fix missing exclude_{host,guest} setting
perf bpf: Add missing free to bpf_event__print_bpf_prog_info()
perf beauty: Update copy of linux/socket.h with the kernel sources
perf clang: Fixes for more recent LLVM/clang
tools: Bump minimum LLVM C++ std to GNU++14
perf bpf: Pull in bpf_program__get_prog_info_linear()
Revert "perf bench futex: Add support for 32-bit systems with 64-bit time_t"
perf test sample-parsing: Add endian test for struct branch_flags
...
Diffstat (limited to 'tools/lib/list_sort.c')
-rw-r--r-- | tools/lib/list_sort.c | 252 |
1 files changed, 252 insertions, 0 deletions
diff --git a/tools/lib/list_sort.c b/tools/lib/list_sort.c new file mode 100644 index 0000000000000..10c067e3a8d2e --- /dev/null +++ b/tools/lib/list_sort.c @@ -0,0 +1,252 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/kernel.h> +#include <linux/compiler.h> +#include <linux/export.h> +#include <linux/string.h> +#include <linux/list_sort.h> +#include <linux/list.h> + +/* + * Returns a list organized in an intermediate format suited + * to chaining of merge() calls: null-terminated, no reserved or + * sentinel head node, "prev" links not maintained. + */ +__attribute__((nonnull(2,3,4))) +static struct list_head *merge(void *priv, list_cmp_func_t cmp, + struct list_head *a, struct list_head *b) +{ + struct list_head *head, **tail = &head; + + for (;;) { + /* if equal, take 'a' -- important for sort stability */ + if (cmp(priv, a, b) <= 0) { + *tail = a; + tail = &a->next; + a = a->next; + if (!a) { + *tail = b; + break; + } + } else { + *tail = b; + tail = &b->next; + b = b->next; + if (!b) { + *tail = a; + break; + } + } + } + return head; +} + +/* + * Combine final list merge with restoration of standard doubly-linked + * list structure. This approach duplicates code from merge(), but + * runs faster than the tidier alternatives of either a separate final + * prev-link restoration pass, or maintaining the prev links + * throughout. + */ +__attribute__((nonnull(2,3,4,5))) +static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head, + struct list_head *a, struct list_head *b) +{ + struct list_head *tail = head; + u8 count = 0; + + for (;;) { + /* if equal, take 'a' -- important for sort stability */ + if (cmp(priv, a, b) <= 0) { + tail->next = a; + a->prev = tail; + tail = a; + a = a->next; + if (!a) + break; + } else { + tail->next = b; + b->prev = tail; + tail = b; + b = b->next; + if (!b) { + b = a; + break; + } + } + } + + /* Finish linking remainder of list b on to tail */ + tail->next = b; + do { + /* + * If the merge is highly unbalanced (e.g. the input is + * already sorted), this loop may run many iterations. + * Continue callbacks to the client even though no + * element comparison is needed, so the client's cmp() + * routine can invoke cond_resched() periodically. + */ + if (unlikely(!++count)) + cmp(priv, b, b); + b->prev = tail; + tail = b; + b = b->next; + } while (b); + + /* And the final links to make a circular doubly-linked list */ + tail->next = head; + head->prev = tail; +} + +/** + * list_sort - sort a list + * @priv: private data, opaque to list_sort(), passed to @cmp + * @head: the list to sort + * @cmp: the elements comparison function + * + * The comparison function @cmp must return > 0 if @a should sort after + * @b ("@a > @b" if you want an ascending sort), and <= 0 if @a should + * sort before @b *or* their original order should be preserved. It is + * always called with the element that came first in the input in @a, + * and list_sort is a stable sort, so it is not necessary to distinguish + * the @a < @b and @a == @b cases. + * + * This is compatible with two styles of @cmp function: + * - The traditional style which returns <0 / =0 / >0, or + * - Returning a boolean 0/1. + * The latter offers a chance to save a few cycles in the comparison + * (which is used by e.g. plug_ctx_cmp() in block/blk-mq.c). + * + * A good way to write a multi-word comparison is:: + * + * if (a->high != b->high) + * return a->high > b->high; + * if (a->middle != b->middle) + * return a->middle > b->middle; + * return a->low > b->low; + * + * + * This mergesort is as eager as possible while always performing at least + * 2:1 balanced merges. Given two pending sublists of size 2^k, they are + * merged to a size-2^(k+1) list as soon as we have 2^k following elements. + * + * Thus, it will avoid cache thrashing as long as 3*2^k elements can + * fit into the cache. Not quite as good as a fully-eager bottom-up + * mergesort, but it does use 0.2*n fewer comparisons, so is faster in + * the common case that everything fits into L1. + * + * + * The merging is controlled by "count", the number of elements in the + * pending lists. This is beautifully simple code, but rather subtle. + * + * Each time we increment "count", we set one bit (bit k) and clear + * bits k-1 .. 0. Each time this happens (except the very first time + * for each bit, when count increments to 2^k), we merge two lists of + * size 2^k into one list of size 2^(k+1). + * + * This merge happens exactly when the count reaches an odd multiple of + * 2^k, which is when we have 2^k elements pending in smaller lists, + * so it's safe to merge away two lists of size 2^k. + * + * After this happens twice, we have created two lists of size 2^(k+1), + * which will be merged into a list of size 2^(k+2) before we create + * a third list of size 2^(k+1), so there are never more than two pending. + * + * The number of pending lists of size 2^k is determined by the + * state of bit k of "count" plus two extra pieces of information: + * + * - The state of bit k-1 (when k == 0, consider bit -1 always set), and + * - Whether the higher-order bits are zero or non-zero (i.e. + * is count >= 2^(k+1)). + * + * There are six states we distinguish. "x" represents some arbitrary + * bits, and "y" represents some arbitrary non-zero bits: + * 0: 00x: 0 pending of size 2^k; x pending of sizes < 2^k + * 1: 01x: 0 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k + * 2: x10x: 0 pending of size 2^k; 2^k + x pending of sizes < 2^k + * 3: x11x: 1 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k + * 4: y00x: 1 pending of size 2^k; 2^k + x pending of sizes < 2^k + * 5: y01x: 2 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k + * (merge and loop back to state 2) + * + * We gain lists of size 2^k in the 2->3 and 4->5 transitions (because + * bit k-1 is set while the more significant bits are non-zero) and + * merge them away in the 5->2 transition. Note in particular that just + * before the 5->2 transition, all lower-order bits are 11 (state 3), + * so there is one list of each smaller size. + * + * When we reach the end of the input, we merge all the pending + * lists, from smallest to largest. If you work through cases 2 to + * 5 above, you can see that the number of elements we merge with a list + * of size 2^k varies from 2^(k-1) (cases 3 and 5 when x == 0) to + * 2^(k+1) - 1 (second merge of case 5 when x == 2^(k-1) - 1). + */ +__attribute__((nonnull(2,3))) +void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp) +{ + struct list_head *list = head->next, *pending = NULL; + size_t count = 0; /* Count of pending */ + + if (list == head->prev) /* Zero or one elements */ + return; + + /* Convert to a null-terminated singly-linked list. */ + head->prev->next = NULL; + + /* + * Data structure invariants: + * - All lists are singly linked and null-terminated; prev + * pointers are not maintained. + * - pending is a prev-linked "list of lists" of sorted + * sublists awaiting further merging. + * - Each of the sorted sublists is power-of-two in size. + * - Sublists are sorted by size and age, smallest & newest at front. + * - There are zero to two sublists of each size. + * - A pair of pending sublists are merged as soon as the number + * of following pending elements equals their size (i.e. + * each time count reaches an odd multiple of that size). + * That ensures each later final merge will be at worst 2:1. + * - Each round consists of: + * - Merging the two sublists selected by the highest bit + * which flips when count is incremented, and + * - Adding an element from the input as a size-1 sublist. + */ + do { + size_t bits; + struct list_head **tail = &pending; + + /* Find the least-significant clear bit in count */ + for (bits = count; bits & 1; bits >>= 1) + tail = &(*tail)->prev; + /* Do the indicated merge */ + if (likely(bits)) { + struct list_head *a = *tail, *b = a->prev; + + a = merge(priv, cmp, b, a); + /* Install the merged result in place of the inputs */ + a->prev = b->prev; + *tail = a; + } + + /* Move one element from input list to pending */ + list->prev = pending; + pending = list; + list = list->next; + pending->next = NULL; + count++; + } while (list); + + /* End of input; merge together all the pending lists. */ + list = pending; + pending = pending->prev; + for (;;) { + struct list_head *next = pending->prev; + + if (!next) + break; + list = merge(priv, cmp, pending, list); + pending = next; + } + /* The final merge, rebuilding prev links */ + merge_final(priv, cmp, head, pending, list); +} +EXPORT_SYMBOL(list_sort); |