summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2018-04-12Merge tag 'trace-v4.17-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "A few clean ups and bug fixes: - replace open coded "ARRAY_SIZE()" with macro - updates to uprobes - bug fix for perf event filter on error path" * tag 'trace-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Enforce passing in filter=NULL to create_filter() trace_uprobe: Simplify probes_seq_show() trace_uprobe: Use %lx to display offset tracing/uprobe: Add support for overlayfs tracing: Use ARRAY_SIZE() macro instead of open coding it
2018-04-11tracing: Enforce passing in filter=NULL to create_filter()Steven Rostedt (VMware)
There's some inconsistency with what to set the output parameter filterp when passing to create_filter(..., struct event_filter **filterp). Whatever filterp points to, should be NULL when calling this function. The create_filter() calls create_filter_start() with a pointer to a local "filter" variable that is set to NULL. The create_filter_start() has a WARN_ON() if the passed in pointer isn't pointing to a value set to NULL. Ideally, create_filter() should pass the filterp variable it received to create_filter_start() and not hide it as with a local variable, this allowed create_filter() to fail, and not update the passed in filter, and the caller of create_filter() then tried to free filter, which was never initialized to anything, causing memory corruption. Link: http://lkml.kernel.org/r/00000000000032a0c30569916870@google.com Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster") Reported-by: syzbot+dadcc936587643d7f568@syzkaller.appspotmail.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11trace_uprobe: Simplify probes_seq_show()Ravi Bangoria
Simplify probes_seq_show() function. No change in output before and after patch. Link: http://lkml.kernel.org/r/20180315082756.9050-2-ravi.bangoria@linux.vnet.ibm.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11trace_uprobe: Use %lx to display offsetRavi Bangoria
tu->offset is unsigned long, not a pointer, thus %lx should be used to print it, not the %px. Link: http://lkml.kernel.org/r/20180315082756.9050-1-ravi.bangoria@linux.vnet.ibm.com Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 0e4d819d0893 ("trace_uprobe: Display correct offset in uprobe_events") Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11tracing/uprobe: Add support for overlayfsHoward McLauchlan
uprobes cannot successfully attach to binaries located in a directory mounted with overlayfs. To verify, create directories for mounting overlayfs (upper,lower,work,merge), move some binary into merge/ and use readelf to obtain some known instruction of the binary. I used /bin/true and the entry instruction(0x13b0): $ mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merge $ cd /sys/kernel/debug/tracing $ echo 'p:true_entry PATH_TO_MERGE/merge/true:0x13b0' > uprobe_events $ echo 1 > events/uprobes/true_entry/enable This returns 'bash: echo: write error: Input/output error' and dmesg tells us 'event trace: Could not enable event true_entry' This change makes create_trace_uprobe() look for the real inode of a dentry. In the case of normal filesystems, this simplifies to just returning the inode. In the case of overlayfs(and similar fs) we will obtain the underlying dentry and corresponding inode, upon which uprobes can successfully register. Running the example above with the patch applied, we can see that the uprobe is enabled and will output to trace as expected. Link: http://lkml.kernel.org/r/20180410231030.2720-1-hmclauchlan@fb.com Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11tracing: Use ARRAY_SIZE() macro instead of open coding itJérémy Lefaure
It is useless to re-invent the ARRAY_SIZE macro so let's use it instead of DATA_CNT. Found with Coccinelle with the following semantic patch: @r depends on (org || report)@ type T; T[] E; position p; @@ ( (sizeof(E)@p /sizeof(*E)) | (sizeof(E)@p /sizeof(E[...])) | (sizeof(E)@p /sizeof(T)) ) Link: http://lkml.kernel.org/r/20171016012250.26453-1-jeremy.lefaure@lse.epita.fr Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> [ Removed useless include of kernel.h ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11bpf/tracing: fix a deadlock in perf_event_detach_bpf_progYonghong Song
syzbot reported a possible deadlock in perf_event_detach_bpf_prog. The error details: ====================================================== WARNING: possible circular locking dependency detected 4.16.0-rc7+ #3 Not tainted ------------------------------------------------------ syz-executor7/24531 is trying to acquire lock: (bpf_event_mutex){+.+.}, at: [<000000008a849b07>] perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854 but task is already holding lock: (&mm->mmap_sem){++++}, at: [<0000000038768f87>] vm_mmap_pgoff+0x198/0x280 mm/util.c:353 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mm->mmap_sem){++++}: __might_fault+0x13a/0x1d0 mm/memory.c:4571 _copy_to_user+0x2c/0xc0 lib/usercopy.c:25 copy_to_user include/linux/uaccess.h:155 [inline] bpf_prog_array_copy_info+0xf2/0x1c0 kernel/bpf/core.c:1694 perf_event_query_prog_array+0x1c7/0x2c0 kernel/trace/bpf_trace.c:891 _perf_ioctl kernel/events/core.c:4750 [inline] perf_ioctl+0x3e1/0x1480 kernel/events/core.c:4770 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 SYSC_ioctl fs/ioctl.c:701 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> #0 (bpf_event_mutex){+.+.}: lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920 __mutex_lock_common kernel/locking/mutex.c:756 [inline] __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854 perf_event_free_bpf_prog kernel/events/core.c:8147 [inline] _free_event+0xbdb/0x10f0 kernel/events/core.c:4116 put_event+0x24/0x30 kernel/events/core.c:4204 perf_mmap_close+0x60d/0x1010 kernel/events/core.c:5172 remove_vma+0xb4/0x1b0 mm/mmap.c:172 remove_vma_list mm/mmap.c:2490 [inline] do_munmap+0x82a/0xdf0 mm/mmap.c:2731 mmap_region+0x59e/0x15a0 mm/mmap.c:1646 do_mmap+0x6c0/0xe00 mm/mmap.c:1483 do_mmap_pgoff include/linux/mm.h:2223 [inline] vm_mmap_pgoff+0x1de/0x280 mm/util.c:355 SYSC_mmap_pgoff mm/mmap.c:1533 [inline] SyS_mmap_pgoff+0x462/0x5f0 mm/mmap.c:1491 SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline] SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(bpf_event_mutex); lock(&mm->mmap_sem); lock(bpf_event_mutex); *** DEADLOCK *** ====================================================== The bug is introduced by Commit f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp") where copy_to_user, which requires mm->mmap_sem, is called inside bpf_event_mutex lock. At the same time, during perf_event file descriptor close, mm->mmap_sem is held first and then subsequent perf_event_detach_bpf_prog needs bpf_event_mutex lock. Such a senario caused a deadlock. As suggested by Daniel, moving copy_to_user out of the bpf_event_mutex lock should fix the problem. Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp") Reported-by: syzbot+dc5ca0e4c9bfafaf2bae@syzkaller.appspotmail.com Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-10Merge tag 'trace-v4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New features: - Tom Zanussi's extended histogram work. This adds the synthetic events to have histograms from multiple event data Adds triggers "onmatch" and "onmax" to call the synthetic events Several updates to the histogram code from this - Allow way to nest ring buffer calls in the same context - Allow absolute time stamps in ring buffer - Rewrite of filter code parsing based on Al Viro's suggestions - Setting of trace_clock to global if TSC is unstable (on boot) - Better OOM handling when allocating large ring buffers - Added initcall tracepoints (consolidated initcall_debug code with them) And other various fixes and clean ups" * tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits) init: Have initcall_debug still work without CONFIG_TRACEPOINTS init, tracing: Have printk come through the trace events for initcall_debug init, tracing: instrument security and console initcall trace events init, tracing: Add initcall trace events tracing: Add rcu dereference annotation for test func that touches filter->prog tracing: Add rcu dereference annotation for filter->prog tracing: Fixup logic inversion on setting trace_global_clock defaults tracing: Hide global trace clock from lockdep ring-buffer: Add set/clear_current_oom_origin() during allocations ring-buffer: Check if memory is available before allocation lockdep: Add print_irqtrace_events() to __warn vsprintf: Do not preprocess non-dereferenced pointers for bprintf (%px and %pK) tracing: Uninitialized variable in create_tracing_map_fields() tracing: Make sure variable string fields are NULL-terminated tracing: Add action comparisons when testing matching hist triggers tracing: Don't add flag strings when displaying variable references tracing: Fix display of hist trigger expressions containing timestamps ftrace: Drop a VLA in module_exists() tracing: Mention trace_clock=global when warning about unstable clocks tracing: Default to using trace_global_clock if sched_clock is unstable ...
2018-04-10tracing/uprobe_event: Fix strncpy corner caseMasami Hiramatsu
Fix string fetch function to terminate with NUL. It is OK to drop the rest of string. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org Cc: 范龙飞 <long7573@126.com> Fixes: 5baaa59ef09e ("tracing/probes: Implement 'memory' fetch method for uprobes") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-10perf/core: Fix perf_uprobe_init()Song Liu
Similarly to the uprobe PMU fix in perf_kprobe_init(), fix error handling in perf_uprobe_init() as well. Reported-by: 范龙飞 <long7573@126.com> Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: e12f03d7031a ("perf/core: Implement the 'perf_kprobe' PMU") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-10perf/core: Fix perf_kprobe_init()Masami Hiramatsu
Fix error handling in perf_kprobe_init(): ================================================================== BUG: KASAN: slab-out-of-bounds in strlen+0x8e/0xa0 lib/string.c:482 Read of size 1 at addr ffff88003f9cc5c0 by task syz-executor2/23095 CPU: 0 PID: 23095 Comm: syz-executor2 Not tainted 4.16.0+ #24 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xca/0x13e lib/dump_stack.c:113 print_address_description+0x6e/0x2c0 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x256/0x380 mm/kasan/report.c:412 strlen+0x8e/0xa0 lib/string.c:482 kstrdup+0x21/0x70 mm/util.c:55 alloc_trace_kprobe+0xc8/0x930 kernel/trace/trace_kprobe.c:325 create_local_trace_kprobe+0x4f/0x3a0 kernel/trace/trace_kprobe.c:1438 perf_kprobe_init+0x149/0x1f0 kernel/trace/trace_event_perf.c:264 perf_kprobe_event_init+0xa8/0x120 kernel/events/core.c:8407 perf_try_init_event+0xcb/0x2a0 kernel/events/core.c:9719 perf_init_event kernel/events/core.c:9750 [inline] perf_event_alloc+0x1367/0x1e20 kernel/events/core.c:10022 SYSC_perf_event_open+0x242/0x2330 kernel/events/core.c:10477 do_syscall_64+0x198/0x640 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Reported-by: 范龙飞 <long7573@126.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: e12f03d7031a ("perf/core: Implement the 'perf_kprobe' PMU") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-06tracing: Add rcu dereference annotation for test func that touches filter->progSteven Rostedt (VMware)
A boot up test function update_pred_fn() dereferences filter->prog without the proper rcu annotation. To do this, we must also take the event_mutex first. Normally, this isn't needed because this test function can not race with other use cases that touch the event filters (it is disabled if any events are enabled). Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06tracing: Add rcu dereference annotation for filter->progSteven Rostedt (VMware)
ftrace_function_set_filter() referenences filter->prog without annotation and sparse complains about it. It needs a rcu_dereference_protected() wrapper. Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06tracing: Fixup logic inversion on setting trace_global_clock defaultsChris Wilson
In commit 932066a15335 ("tracing: Default to using trace_global_clock if sched_clock is unstable"), the logic for deciding to override the default clock if unstable was reversed from the earlier posting. I was trying to reduce the width of the message by using an early return rather than a if-block, but reverted back to using the if-block and accidentally left the predicate inverted. Link: http://lkml.kernel.org/r/20180404212450.26646-1-chris@chris-wilson.co.uk Fixes: 932066a15335 ("tracing: Default to using trace_global_clock if sched_clock is unstable") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06tracing: Hide global trace clock from lockdepSteven Rostedt (VMware)
Function tracing can trace in NMIs and such. If the TSC is determined to be unstable, the tracing clock will switch to the global clock on boot up, unless "trace_clock" is specified on the kernel command line. The global clock disables interrupts to access sched_clock_cpu(), and in doing so can be done within lockdep internals (because of function tracing and NMIs). This can trigger false lockdep splats. The trace_clock_global() is special, best not to trace the irq logic within it. Link: http://lkml.kernel.org/r/20180404145015.77bde42d@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06ring-buffer: Add set/clear_current_oom_origin() during allocationsSteven Rostedt (VMware)
As si_mem_available() can say there is enough memory even though the memory available is not useable by the ring buffer, it is best to not kill innocent applications because the ring buffer is taking up all the memory while it is trying to allocate a great deal of memory. If the allocator is user space (because kernel threads can also increase the size of the kernel ring buffer on boot up), then after si_mem_available() says there is enough memory, set the OOM killer to kill the current task if an OOM triggers during the allocation. Link: http://lkml.kernel.org/r/20180404062340.GD6312@dhcp22.suse.cz Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06ring-buffer: Check if memory is available before allocationSteven Rostedt (VMware)
The ring buffer is made up of a link list of pages. When making the ring buffer bigger, it will allocate all the pages it needs before adding to the ring buffer, and if it fails, it frees them and returns an error. This makes increasing the ring buffer size an all or nothing action. When this was first created, the pages were allocated with "NORETRY". This was to not cause any Out-Of-Memory (OOM) actions from allocating the ring buffer. But NORETRY was too strict, as the ring buffer would fail to expand even when there's memory available, but was taken up in the page cache. Commit 848618857d253 ("tracing/ring_buffer: Try harder to allocate") changed the allocating from NORETRY to RETRY_MAYFAIL. The RETRY_MAYFAIL would allocate from the page cache, but if there was no memory available, it would simple fail the allocation and not trigger an OOM. This worked fine, but had one problem. As the ring buffer would allocate one page at a time, it could take up all memory in the system before it failed to allocate and free that memory. If the allocation is happening and the ring buffer allocates all memory and then tries to take more than available, its allocation will not trigger an OOM, but if there's any allocation that happens someplace else, that could trigger an OOM, even though once the ring buffer's allocation fails, it would free up all the previous memory it tried to allocate, and allow other memory allocations to succeed. Commit d02bd27bd33dd ("mm/page_alloc.c: calculate 'available' memory in a separate function") separated out si_mem_availble() as a separate function that could be used to see how much memory is available in the system. Using this function to make sure that the ring buffer could be allocated before it tries to allocate pages we can avoid allocating all memory in the system and making it vulnerable to OOMs if other allocations are taking place. Link: http://lkml.kernel.org/r/1522320104-6573-1-git-send-email-zhaoyang.huang@spreadtrum.com CC: stable@vger.kernel.org Cc: linux-mm@kvack.org Fixes: 848618857d253 ("tracing/ring_buffer: Try harder to allocate") Requires: d02bd27bd33dd ("mm/page_alloc.c: calculate 'available' memory in a separate function") Reported-by: Zhaoyang Huang <huangzhaoyang@gmail.com> Tested-by: Joel Fernandes <joelaf@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06tracing: Uninitialized variable in create_tracing_map_fields()Dan Carpenter
Smatch complains that idx can be used uninitialized when we check if (idx < 0). It has to be the first iteration through the loop and the HIST_FIELD_FL_STACKTRACE bit has to be clear and the HIST_FIELD_FL_VAR bit has to be set to reach the bug. Link: http://lkml.kernel.org/r/20180328114815.GC29050@mwanda Fixes: 30350d65ac56 ("tracing: Add variable support to hist triggers") Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06tracing: Make sure variable string fields are NULL-terminatedTom Zanussi
The strncpy() currently being used for variable string fields can result in a lack of termination if the string length is equal to the field size. Use the safer strscpy() instead, which will guarantee termination. Link: http://lkml.kernel.org/r/fb97c1e518fb358c12a4057d7445ba2c46956cd7.1522256721.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06tracing: Add action comparisons when testing matching hist triggersTom Zanussi
Actions also need to be considered when checking for matching triggers - triggers differing only by action should be allowed, but currently aren't because the matching check ignores the action and erroneously returns -EEXIST. Add and call an actions_match() function to address that. Here's an example using onmatch() actions. The first -EEXIST shouldn't occur because the onmatch() is different in the second wakeup_latency() param. The second -EEXIST shouldn't occur because it's a different action (in this case, it doesn't have an action, so shouldn't be seen as being the same and therefore rejected). In the after case, both are correctly accepted (and trying to add one of them again returns -EEXIST as it should). before: # echo 'wakeup_latency u64 lat; pid_t pid' >> /sys/kernel/debug/tracing/synthetic_events # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger -su: echo: write error: File exists # echo 'hist:keys=next_pid if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger -su: echo: write error: File exists after: # echo 'wakeup_latency u64 lat; pid_t pid' >> /sys/kernel/debug/tracing/synthetic_events # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=next_pid if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger Link: http://lkml.kernel.org/r/a7fd668b87ec10736c8f016ac4279c8480d50c2b.1522256721.git.tom.zanussi@linux.intel.com Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06tracing: Don't add flag strings when displaying variable referencesTom Zanussi
Variable references should never have flags appended when displayed - prevent that from happening. Before: # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0.usecs:... After: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:... Link: http://lkml.kernel.org/r/913318a5610ef6b24af2522575f671fa6ee19b6b.1522256721.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06tracing: Fix display of hist trigger expressions containing timestampsTom Zanussi
When displaying hist triggers, variable references that have the timestamp field flag set are erroneously displayed as common_timestamp rather than the variable reference. Additionally, timestamp expressions are displayed in the same way. Fix this by forcing the timestamp flag handling to follow variable reference and expression handling. Before: # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs:... After: # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0.usecs:... Link: http://lkml.kernel.org/r/92746b06be67499c2a6217bd55395b350ad18fad.1522256721.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06ftrace: Drop a VLA in module_exists()Salvatore Mesoraca
Avoid a VLA by using a real constant expression instead of a variable. The compiler should be able to optimize the original code and avoid using an actual VLA. Anyway this change is useful because it will avoid a false positive with -Wvla, it might also help the compiler generating better code. Link: http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Link: http://lkml.kernel.org/r/1522399988-8815-1-git-send-email-s.mesoraca16@gmail.com Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06tracing: Mention trace_clock=global when warning about unstable clocksChris Wilson
Mention the alternative of adding trace_clock=global to the kernel command line when we detect that we've used an unstable clock across a suspend/resume cycle. Link: http://lkml.kernel.org/r/20180330150132.16903-2-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06tracing: Default to using trace_global_clock if sched_clock is unstableChris Wilson
Across suspend, we may see a very large drift in timestamps if the sched clock is unstable, prompting the global trace's ringbuffer code to warn and suggest switching to the global clock. Preempt this request by detecting when the sched clock is unstable (determined during late_initcall) and automatically switching the default clock over to trace_global_clock. This should prevent requiring user interaction to resolve warnings such as: Delta way too big! 18446743856563626466 ts=18446744054496180323 write stamp = 197932553857 If you just came from a suspend/resume, please switch to the trace global clock: echo global > /sys/kernel/debug/tracing/trace_clock Link: http://lkml.kernel.org/r/20180330150132.16903-1-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: kfifo: fix inaccurate comment tools/thermal: tmon: fix for segfault net: Spelling s/stucture/structure/ edd: don't spam log if no EDD information is present Documentation: Fix early-microcode.txt references after file rename tracing: Block comments should align the * on each line treewide: Fix typos in printk GenWQE: Fix a typo in two comments treewide: Align function definition open/close braces
2018-04-04Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull time(r) updates from Thomas Gleixner: "A small set of updates for timers and timekeeping: - The most interesting change is the consolidation of clock MONOTONIC and clock BOOTTIME. Clock MONOTONIC behaves now exactly like clock BOOTTIME and does not longer ignore the time spent in suspend. A new clock MONOTONIC_ACTIVE is provived which behaves like clock MONOTONIC in kernels before this change. This allows applications to programmatically check for the clock MONOTONIC behaviour. As discussed in the review thread, this has the potential of breaking user space and we might have to revert this. Knock on wood that we can avoid that exercise. - Updates to the NTP mechanism to improve accuracy - A new kernel internal data structure to aid the ongoing Y2038 work. - Cleanups and simplifications of the clocksource code. - Make the alarmtimer code play nicely with debugobjects" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: alarmtimer: Init nanosleep alarm timer on stack y2038: Introduce struct __kernel_old_timeval tracing: Unify the "boot" and "mono" tracing clocks hrtimer: Unify MONOTONIC and BOOTTIME clock behavior posix-timers: Unify MONOTONIC and BOOTTIME clock behavior timekeeping: Remove boot time specific code Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock timekeeping/ntp: Determine the multiplier directly from NTP tick length timekeeping/ntp: Don't align NTP frequency adjustments to ticks clocksource: Use ATTRIBUTE_GROUPS clocksource: Use DEVICE_ATTR_RW/RO/WO to define device attributes clocksource: Don't walk the clocksource list for empty override
2018-04-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support offloading wireless authentication to userspace via NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari. 2) A lot of work on network namespace setup/teardown from Kirill Tkhai. Setup and cleanup of namespaces now all run asynchronously and thus performance is significantly increased. 3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon Streiff. 4) Support zerocopy on RDS sockets, from Sowmini Varadhan. 5) Use denser instruction encoding in x86 eBPF JIT, from Daniel Borkmann. 6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime Chevallier. 7) Support grafting of child qdiscs in mlxsw driver, from Nogah Frankel. 8) Add packet forwarding tests to selftests, from Ido Schimmel. 9) Deal with sub-optimal GSO packets better in BBR congestion control, from Eric Dumazet. 10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern. 11) Add path MTU tests to selftests, from Stefano Brivio. 12) Various bits of IPSEC offloading support for mlx5, from Aviad Yehezkel, Yossi Kuperman, and Saeed Mahameed. 13) Support RSS spreading on ntuple filters in SFC driver, from Edward Cree. 14) Lots of sockmap work from John Fastabend. Applications can use eBPF to filter sendmsg and sendpage operations. 15) In-kernel receive TLS support, from Dave Watson. 16) Add XDP support to ixgbevf, this is significant because it should allow optimized XDP usage in various cloud environments. From Tony Nguyen. 17) Add new Intel E800 series "ice" ethernet driver, from Anirudh Venkataramanan et al. 18) IP fragmentation match offload support in nfp driver, from Pieter Jansen van Vuuren. 19) Support XDP redirect in i40e driver, from Björn Töpel. 20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of tracepoints in their raw form, from Alexei Starovoitov. 21) Lots of striding RQ improvements to mlx5 driver with many performance improvements, from Tariq Toukan. 22) Use rhashtable for inet frag reassembly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits) net: mvneta: improve suspend/resume net: mvneta: split rxq/txq init and txq deinit into SW and HW parts ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh net: bgmac: Fix endian access in bgmac_dma_tx_ring_free() net: bgmac: Correctly annotate register space route: check sysctl_fib_multipath_use_neigh earlier than hash fix typo in command value in drivers/net/phy/mdio-bitbang. sky2: Increase D3 delay to sky2 stops working after suspend net/mlx5e: Set EQE based as default TX interrupt moderation mode ibmvnic: Disable irqs before exiting reset from closed state net: sched: do not emit messages while holding spinlock vlan: also check phy_driver ts_info for vlan's real device Bluetooth: Mark expected switch fall-throughs Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME Bluetooth: btrsi: remove unused including <linux/version.h> Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4 sh_eth: kill useless check in __sh_eth_get_regs() sh_eth: add sh_eth_cpu_data::no_xdfar flag ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data() ipv4: factorize sk_wmem_alloc updates done by __ip_append_data() ...
2018-04-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bpf: Check attach type at prog load timeAndrey Ignatov
== The problem == There are use-cases when a program of some type can be attached to multiple attach points and those attach points must have different permissions to access context or to call helpers. E.g. context structure may have fields for both IPv4 and IPv6 but it doesn't make sense to read from / write to IPv6 field when attach point is somewhere in IPv4 stack. Same applies to BPF-helpers: it may make sense to call some helper from some attach point, but not from other for same prog type. == The solution == Introduce `expected_attach_type` field in in `struct bpf_attr` for `BPF_PROG_LOAD` command. If scenario described in "The problem" section is the case for some prog type, the field will be checked twice: 1) At load time prog type is checked to see if attach type for it must be known to validate program permissions correctly. Prog will be rejected with EINVAL if it's the case and `expected_attach_type` is not specified or has invalid value. 2) At attach time `attach_type` is compared with `expected_attach_type`, if prog type requires to have one, and, if they differ, attach will be rejected with EINVAL. The `expected_attach_type` is now available as part of `struct bpf_prog` in both `bpf_verifier_ops->is_valid_access()` and `bpf_verifier_ops->get_func_proto()` () and can be used to check context accesses and calls to helpers correspondingly. Initially the idea was discussed by Alexei Starovoitov <ast@fb.com> and Daniel Borkmann <daniel@iogearbox.net> here: https://marc.info/?l=linux-netdev&m=152107378717201&w=2 Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-29Merge branch 'perf/urgent' into perf/coreIngo Molnar
Conflicts: kernel/events/hw_breakpoint.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-28bpf: introduce BPF_RAW_TRACEPOINTAlexei Starovoitov
Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access kernel internal arguments of the tracepoints in their raw form. >From bpf program point of view the access to the arguments look like: struct bpf_raw_tracepoint_args { __u64 args[0]; }; int bpf_prog(struct bpf_raw_tracepoint_args *ctx) { // program can read args[N] where N depends on tracepoint // and statically verified at program load+attach time } kprobe+bpf infrastructure allows programs access function arguments. This feature allows programs access raw tracepoint arguments. Similar to proposed 'dynamic ftrace events' there are no abi guarantees to what the tracepoints arguments are and what their meaning is. The program needs to type cast args properly and use bpf_probe_read() helper to access struct fields when argument is a pointer. For every tracepoint __bpf_trace_##call function is prepared. In assembler it looks like: (gdb) disassemble __bpf_trace_xdp_exception Dump of assembler code for function __bpf_trace_xdp_exception: 0xffffffff81132080 <+0>: mov %ecx,%ecx 0xffffffff81132082 <+2>: jmpq 0xffffffff811231f0 <bpf_trace_run3> where TRACE_EVENT(xdp_exception, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, u32 act), The above assembler snippet is casting 32-bit 'act' field into 'u64' to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is. All of ~500 of __bpf_trace_*() functions are only 5-10 byte long and in total this approach adds 7k bytes to .text. This approach gives the lowest possible overhead while calling trace_xdp_exception() from kernel C code and transitioning into bpf land. Since tracepoint+bpf are used at speeds of 1M+ events per second this is valuable optimization. The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced that returns anon_inode FD of 'bpf-raw-tracepoint' object. The user space looks like: // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type prog_fd = bpf_prog_load(...); // receive anon_inode fd for given bpf_raw_tracepoint with prog attached raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd); Ctrl-C of tracing daemon or cmdline tool that uses this feature will automatically detach bpf program, unload it and unregister tracepoint probe. On the kernel side the __bpf_raw_tp_map section of pointers to tracepoint definition and to __bpf_trace_*() probe function is used to find a tracepoint with "xdp_exception" name and corresponding __bpf_trace_xdp_exception() probe function which are passed to tracepoint_probe_register() to connect probe with tracepoint. Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf tracepoint mechanisms. perf_event_open() can be used in parallel on the same tracepoint. Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted. Each with its own bpf program. The kernel will execute all tracepoint probes and all attached bpf programs. In the future bpf_raw_tracepoints can be extended with query/introspection logic. __bpf_raw_tp_map section logic was contributed by Steven Rostedt Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-27tracing: Block comments should align the * on each lineRohit Visavalia
Resolved Block comments use * on subsequent lines checkpatch warning. Issue found by checkpatch. Signed-off-by: Rohit Visavalia <rohit.visavalia@softnautics.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-03-26treewide: Align function definition open/close bracesJoe Perches
Some functions definitions have either the initial open brace and/or the closing brace outside of column 1. Move those braces to column 1. This allows various function analyzers like gnu complexity to work properly for these modified functions. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Nicolin Chen <nicoleotsuka@gmail.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-03-23Merge tag 'trace-v4.16-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull kprobe fixes from Steven Rostedt: "The documentation for kprobe events says that symbol offets can take both a + and - sign to get to befor and after the symbol address. But in actuality, the code does not support the minus. This fixes that issue, and adds a few more selftests to kprobe events" * tag 'trace-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests: ftrace: Add a testcase for probepoint selftests: ftrace: Add a testcase for string type with kprobe_event selftests: ftrace: Add probe event argument syntax testcase tracing: probeevent: Fix to support minus offset from symbol
2018-03-23tracing: probeevent: Fix to support minus offset from symbolMasami Hiramatsu
In Documentation/trace/kprobetrace.txt, it says @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) However, the parser doesn't parse minus offset correctly, since commit 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned") drops minus ("-") offset support for kprobe probe address usage. This fixes the traceprobe_split_symbol_offset() to parse minus offset again with checking the offset range, and add a minus offset check in kprobe probe address usage. Link: http://lkml.kernel.org/r/152129028983.31874.13419301530285775521.stgit@devbox Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned") Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tracing: Fix a potential NULL dereferenceDan Carpenter
We forgot to set the error code on this path so we return ERR_PTR(0) which is NULL. It results in a NULL dereference in the caller. Link: http://lkml.kernel.org/r/20180323113735.GC28518@mwanda Fixes: 100719dcef44 ("tracing: Add simple expression support to hist triggers") Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-20trace/bpf: remove helper bpf_perf_prog_read_value from tracepoint type programsYonghong Song
Commit 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value") added helper bpf_perf_prog_read_value so that perf_event type program can read event counter and enabled/running time. This commit, however, introduced a bug which allows this helper for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value needs to access perf_event through its bpf_perf_event_data_kern type context, which is not available for tracepoint type program. This patch fixed the issue by separating bpf_func_proto between tracepoint and perf_event type programs and removed bpf_perf_prog_read_value from tracepoint func prototype. Fixes: 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-14tracing: Rewrite filter logic to be simpler and fasterSteven Rostedt (VMware)
Al Viro reviewed the filter logic of ftrace trace events and found it to be very troubling. It creates a binary tree based on the logic operators and walks it during tracing. He sent myself and Tom Zanussi a long explanation (and formal proof) of how to do the string parsing better and end up with a program array that can be simply iterated to come up with the correct results. I took his ideas and his pseudo code and rewrote the filter logic based on them. In doing so, I was able to remove a lot of code, and have a much more condensed filter logic in the process. I wrote a very long comment describing the methadology that Al proposed in my own words. For more info on how this works, read the comment above predicate_parse(). Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-14tracing: Clean up and document pred_funcs_##type creation and useSteven Rostedt (VMware)
The pred_funcs_##type arrays consist of five functions that are assigned based on the ops. The array must be in the same order of the ops each function represents. The PRED_FUNC_START macro denotes the op enum that starts the op that maps to the pred_funcs_##type arrays. This is all very subtle and prone to bugs if the code is changed. Add comments describing how PRED_FUNC_START and pred_funcs_##type array is used, and also a PRED_FUNC_MAX that is the maximum number of functions in the arrays. Clean up select_comparison_fn() that assigns the predicates to the pred_funcs_##type array function as well as add protection in case an op is passed in that does not map correctly to the array. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-14tracing: Combine enum and arrays into single macro in filter codeSteven Rostedt (VMware)
Instead of having a separate enum that is the index into another array, like a string array, make a single macro that combines them into a single list, and then the two can not get out of sync. This makes it easier to add and remove items. The macro trick is: #define DOGS \ C( JACK, "Jack Russell") \ C( ITALIAN, "Italian Greyhound") \ C( GERMAN, "German Shepherd") #undef C #define C(a, b) a enum { DOGS }; #undef C #define C(a, b) b static char dogs[] = { DOGS }; Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-13tracing: Unify the "boot" and "mono" tracing clocksThomas Gleixner
Unify the "boot" and "mono" tracing clocks and document the new behaviour. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kevin Easton <kevin@guarana.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20180301165150.489635255@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-10tracing: Embed replace_filter_string() helper functionSteven Rostedt (VMware)
The replace_filter_string() frees the current string and then copies a given string. But in the two locations that it was used, the allocation happened right after the filter was allocated (nothing to replace). There's no need for this to be a helper function. Embedding the allocation in the two places where it was called will make changing the code in the future easier. Also make the variable consistent (always use "filter_string" as the name, as it was used in one instance as "filter_str") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Only add filter list when neededSteven Rostedt (VMware)
replace_system_preds() creates a filter list to free even when it doesn't really need to have it. Only save filters that require synchronize_sched() in the filter list to free. This will allow the code to be updated a bit easier in the future. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Remove filter allocator helperSteven Rostedt (VMware)
The __alloc_filter() function does nothing more that allocate the filter. There's no reason to have it as a helper function. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Use trace_seq instead of open code string appendingSteven Rostedt (VMware)
The filter code does open code string appending to produce an error message. Instead it can be simplified by using trace_seq function helpers. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Remove BUG_ON() from append_filter_string()Steven Rostedt (VMware)
There's no reason to BUG if there's a bug in the filtering code. Simply do a warning and return. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add inter-event blurb to HIST_TRIGGERS config optionTom Zanussi
So that users know that inter-event tracing is supported as part of the HIST_TRIGGERS option, include text to that effect in the help text. Link: http://lkml.kernel.org/r/a38e24231d8d980be636b56d35814570acfd167a.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Use the ring-buffer nesting to allow synthetic events to be tracedSteven Rostedt (VMware)
Synthetic events can be done within the recording of other events. Notify the ring buffer via ring_buffer_nest_start() and ring_buffer_nest_end() that this is intended and not to block it due to its recursion protection. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>